- Manage system availability, health and service levels (SLAs, SLOs) of the large scale cloud infrastructure, running in AWS and GCP.
- Proactively monitor, diagnose, analyze failures, and provide support for software engineers to debug production issues across microservices and distributed platforms.
- Participate in on-call rotation and resolution of issues in a 24x7 multi-cloud (AWS/GCP) environment.
- Monitor metrics and performance of applications and cloud infrastructure.
- Manage code releases, i.e., push code and patches on cloud.
- Own entire lifecycle of incidents (incident management), including reporting, analyzing, handling incidents, all the way up to its closure and writing RCAs.
- Laser focus and be able to analyze scalability, reliability, high availability, performance, software maintainability, and operational challenges.
- Write and maintain runbooks for knowledge driven automated processes and bots.
- Perform capacity planning based on performance, usage, and utilization stats.
- Perform after-hours infrastructure updates and maintenance.
- Follow SRE best practices and procedures.
- Adept in project management methodologies to efficiently coordinate tasks and timelines.
- Build infrastructure as a code using Terraform, Ansible and Kubernetes.
- Manage and performance-tune either databases or streaming data pipelines.
- Manage CI/CD pipelines, configuration, automation tools for infrastructure provisioning.
- Do capacity planning based on performance, usage, and utilizing statistics to improve decision-making and optimize system performance .
- Partner with developers and quality engineering teams to automate the monitoring, alerting, and scalability of our applications and systems.
- Bachelor's degree in Computer Science or Computer Engineering or equivalent.
- Minimum 5 years of devops/SRE experience.
- 3 years' experience working with AWS and/or GCP.
- Must have technical experience working with EC2 (GCE), IAM, S3 (GS), Kubernetes pods, Jenkins, Prometheus, CloudWatch (Stack Driver), Linux, and Shell Scripting.
- Basic understanding of Terraform or CloudFormation or any IaC code is preferred.
- General understanding of distributed systems.
- Understanding of data management technologies including relational and non-relational databases.
- Hands on experience in operating large-scale cloud-based distributed applications.
- The ability to "fix the plane while in flight".
-
Reliability Engineering
5 hours ago
Raytheon Dallas, TX, United States Full timeAZ807: RMS AP Bldg East Hermans Road Building 807, Tucson, AZ, 85756 USA*Position Role Type:* · At Raytheon, the foundation of everything we do is rooted in our values and a higher calling - to help our nation and allies defend freedoms and deter aggression. We bring the strength ...
-
Site Reliability Engineer
1 week ago
ConsultUSA Dallas, United StatesDescription: · Our client has an immediate need for a Site Reliability Engineer, who will be responsible for enabling engineering teams with guidance and tools to deliver frequent, high quality and reliable components as part of our digital platform · Requirements: · Bachelor's d ...
-
Site Reliability Engineer
14 hours ago
CGI Dallas, United States Full timePosition Description: · This is a hybrid onsite role that can be completed from either Phoenix Arizona or Dallas Texas. · CGI is seeking an experienced SRE (Site Reliability Engineer) who has fresh ideas and a unique viewpoint, and who enjoys collaborating with a cross-functiona ...
-
Site Reliability Engineer
3 weeks ago
AllSTEM Connections Plano, United StatesSITE RELIABILTY ENGINEER · ON W2 · PLANO,TX/HOUSTON,TX/DELAWARE · HYBRID REPORTING: 3DAYS ONSITE · SKILLSET NEEDED: · AWS · BIG DATA · SPARK · PYTHON · SCRIPTING · SHELL · PERL · CONTROL-M · AUTOSYS · GRAFANA · ...
-
Principal Reliability Engineer
4 days ago
Raytheon Dallas, United StatesDate Posted: · Country: · United States of America Location: · AZ807: RMS AP Bldg East Hermans Road Building 807, Tucson, AZ, 85756 USA Position Role Type: · Onsite About Us: · At Raytheon, the foundation of everything we do is rooted in our values and a higher calling to he ...
-
Site Reliability Engineer
11 hours ago
AXON-Networks Dallas, United StatesAXON Networks delivers a robust AI-driven, analytics-based orchestration platform and a wide portfolio of next-gen high-speed routers that leverage the newest Wi-Fi technologies. Together, these technologies give ISPs the ability to manage and troubleshoot their networks in real ...
-
Site Reliability Engineer
3 weeks ago
PMG Dallas, United StatesJob Description · Job DescriptionPMG is a digital company that helps marketers connect people with their brand. Focused on people and grounded in data, our award-winning culture fosters meaningful careers. Partnering with the most iconic brands in the world, we put people at the ...
-
Site Reliability Engineer
4 days ago
PRIMUS Global Services, Inc Dallas, United StatesJob Description · Job Description · Site Reliability Engineer - Kafka, Splunk · Pay range - $60 - $66/hr · We have an immediate long-term opportunity with one of our prime clients for a position Site Reliability Engineer - Kafka, Splunk to work in Dallas, TX on Hybrid basis. ...
-
Electrical Reliability Engineer
1 day ago
Avature Dallas, United StatesWestRock (NYSE :WRK) is a global leader in sustainable paper and packaging solutions. We are materials scientists, packaging designers, mechanical engineers and manufacturing experts with a shared purpose: Innovate Boldly. Package Sustainably. Guided by our values of integrity, r ...
-
Site Reliability Engineer
3 weeks ago
Diverse Lynx Dallas, United StatesJob Title: Site Reliability Engineer · Location: Dallas, TX//Onsite · Duration: Full Time-Only · JOB DESCRIPTION: · Responsible for ensuring the reliability of systems, minimizing downtime, and maintaining service-level objectives (SLOs). · •Developing, automation and imple ...
-
Site Reliability Engineer
2 days ago
STIAOS Technologies Dallas, United StatesWe are looking for Site Reliability Engineer for our client location in Dallas TX with following Skills: · *Java Spring boot · *Kubernetes · *eCommerce experience Required. · Key Responsbilities: · *Working with the Applications, Engineering, Platform, Operations and infrastructu ...
-
Site Reliability Engineer
1 week ago
PMG, Inc. Dallas, United StatesPMG is a digital company that helps marketers connect people with their brand. Focused on people and grounded in data, our award-winning culture fosters meaningful careers. Partnering with the most iconic brands in the world, we put people at the center of everything we do to del ...
-
Site Reliability Engineer
1 day ago
Cognizant North America Dallas, United StatesAbout Us: · Cognizant (Nasdaq: CTSH) engineers modern businesses. We help our clients modernize technology, reimagine processes and transform experiences so they can stay ahead in our fast-changing world. Together, we're improving everyday life. See how at · The role is encourag ...
-
Site Reliability Engineer
21 hours ago
Priceline Long Distance LLC Dallas, United StatesThis role is eligible for our hybrid work model: Two days in-office. · Site Reliability Engineer (SRE) · Our Technology team is the backbone of our company: constantly creating, testing, learning and iterating to better meet the needs of our customers. If you thrive in a fast-p ...
-
Site Reliability Engineer
2 weeks ago
KTek Resourcing Dallas, United StatesJob Overview: · We are looking for a motivated Junior Operations Engineer to ensure the smooth operation of our software and systems. This role combines technical expertise with problem-solving skills to automate operational processes, enhance system functionality, and maintain t ...
-
Site Reliability Engineer
21 hours ago
Goldman Sachs Dallas, United StatesMORE ABOUT THIS JOB: · Who we are: · At Goldman Sachs, our culture is one of teamwork, innovation and meritocracy. We often say our people are our greatest asset and we take pride in supporting each colleague both professionally and personally. From collaborative work spaces an ...
-
Site Reliability Engineer
21 hours ago
PMG, Inc. Dallas, United StatesPMG is a digital company that helps marketers connect people with their brand. Focused on people and grounded in data, our award-winning culture fosters meaningful careers. Partnering with the most iconic brands in the world, we put people at the center of everything we do to del ...
-
Site Reliability Engineer
21 hours ago
Copart Dallas, United StatesPosted Wednesday, January 10, 2024 at 12:00 AM · Copart is seeking a Site Reliability Engineer for our Dallas HQ office specializing in Systems and application monitoring and troubleshooting. This position will be part of a 24/7 Global Network Operations team that monitors and pr ...
-
AVP, Reliability Engineer
1 day ago
Synchrony Dallas, United StatesJob Description: · Role Summary/Purpose: · The AVP, Reliability & Automation Engineer craves working in a hands-on system design and architecture environment, and leads by example to make sure time sensitive projects get done on time and to specifications. To be successful in t ...
-
Site Reliability Engineer
6 days ago
SNCloudSolutions LLC Dallas, United StatesJob Description · Job Description · Hands-on experience as an SRE · - Experience with Azure cloud · - Experience with APM tools Dynatrace SaaS, Mezmo (LogDNA), and Azure native tools · - Experience building automation scripts for CI/CD pipelines · - Experience with Github e ...
Reliability Engineer 4 - Dallas, United States - Juniper Networks
Description
Job DescriptionSite Reliability Engineer 4
Location: Anywhere in the United States
Juniper is changing what's possible in networking. We're going beyond building the networks customers expect - we're building the networks customers deserve. And the world is taking note. But to continue to excel, we have work to do. Change in our industry is accelerating. To power connections and empower change, we need radical thinkers, eternal optimists, and energized personalities. We need people like you.
Success requires big thinking and high-reaching goals. Our culture breeds innovation. Here, you will have the opportunity to take chances and let your ideas grow. You will be supported by thoughtful, inclusive, and accessible leaders. You will have every chance to be a part of the conversation and seize our momentum. Your career will be better for it.
At Juniper, we strive to deliver network experiences that transform how people connect, work and live. We Power Connections, Empower Change, and we do that through our core values Being Bold, Building Trust and Delivering Excellence.
Do you want to solve complex problems and build systems that will change the Internet? Do you want to be part of a company that is on the cutting edge of technology? Do you want to work with a world-class team of engineers?
Juniper is seeking a full-time SRE to join our talented team and support high quality technology solutions that revolutionize wireless and wired networks, powered by Artificial Intelligence in the cloud. Juniper provides services through SaaS applications to several enterprises, including Fortune 100 and Fortune 500 customers. You will be responsible for maintaining and improving the company's production environment for rapid scaling and outstanding performance. You will keep stellar cloud uptime and reliability. Your primary responsibilities will be incident management and release management in cloud instances in various regions.
Responsibilities:
Maximum Salary:$147,752.00
The pay range for this position is expected to be between $102,784.00 and $147,752.00/year; however, the base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. The total compensation package for this position also includes medical benefits, 401(k) eligibility, vacation, sick time, and parental leave. Additional details of participation in these benefit plans will be provided if an employee receives an offer of employment.
If hired, employee will be in an "at-will position" and the Company reserves the right to modify base salary (as well as any other payment or compensation program) at any time, including for reasons related to individual performance, Company or individual department/team performance, and market factors.
Juniper's pay range data is provided in accordance with local state pay transparency regulations. Juniper may post different minimum wage ranges for permanent residency petitions pursuant to US Department of Labor requirements.