- Teaching peers about monitoring & observability best practices.
- Guiding & reinforcing proper use of our toolset to improve the quality, reliability & availability of the services our teams offer.
- Implement and enhance monitoring of the hardware & software across our ecosystem.
- Developing and improving instrumentations/integrations.
- Providing guidance on monitoring best practices.
- Providing guidance on monitoring specific hardware & software items (key points to monitor).
- Implement and enhance observability of products & platforms across our ecosystem.
- Developing and improving instrumentation
- Providing guidance on key areas to observe.
- Educating teams on how observability tools work.
- Being responsible for ensuring we provide our internal customers with the best monitoring & observability possible to aid them in raising the quality, reliability & availability of IT corporate infrastructure.
- Scripting / Infrastructure as Code / Process Creation for monitoring & observability implementations & enhancements to lower overhead & improve efficiency.
- Experience with Monitoring / Observability / Site Reliability Engineering
- Engineering degree or equivalent experience and familiarity with engineering best practices.
- Working knowledge of how hardware & software interact in a corporate retail environment.
- Experience with Azure / Azure DevOps
- Deeper knowledge in one or more of the following domains of hardware/software:
- Application Servers (IIS, Tomcat, WebSphere, jBoss, etc)
- Containerization (Kubernetes, VMWare, etc)
- Database (SqlServer, Postgres, DB2, Oracle, etc)
- Message Bus (IBM MQ, Kafka, Active MQ, Rabbit MQ)
- Networking (Cisco ACI, F5 Load Balancers, Firewalls, etc)
- Operating Systems (RedHat, Windows, etc)
- Programming (java, .net, pyton, etc)
- Storage Devices
- Web Servers (apache, nginx, etc)
- Familiar with Agile Scrum process.
- Ability to interact with a variety of personalities and technical skill levels across multiple product & platform teams.
- Proficient in developing and maintaining technical documentation.
- Experience with:
- Datadog
- Nagios
- ServiceNow Event Management / Service Operations Workspace
- Knowledge on the Google Site Reliability Engineering model
- Experience with Infrastructure as Code / Configuration Management tools:
- Terraform
- Ansible
- Azure Dev Ops
- Skills in troubleshooting production environments (this is not a day-to-day responsibility of this role but this experience will prove valuable as we build the tools those teams utilize).
- Strong ownership attitude / track record of taking responsibility.
-
Site Reliability Engineer
1 day ago
Qlik King of Prussia, United States· What makes us Qlik? · A Gartner Magic QuadrantTM Leader for 13 years in a row, Qlik transforms complex data landscapes into actionable insights, driving strategic business outcomes. Serving over 40,000 global customers, our portfolio leverages pervasive data quality and advan ...
-
Reliability Engineer
2 weeks ago
Lockheed Martin Moorestown, United States· Description:This role is for a Facilities Reliability Lead Specialist whose primary responsibilities will be to implement programs that institute asset reliability, reduce operational risk and drive lower costs within the campus, infrastructure, systems and equipment that supp ...
-
Reliability Engineer
1 week ago
Channel Personnel Services Philadelphia, United StatesJob Description · Job DescriptionThe Rotating Equipment Reliability Engineer supports troubleshooting, repairs, selection, and optimization to enhance equipment availability and reliability. · DUTIES / RESPONSIBILITIES · Ø Be a member of the site Rotating Equipment Reliability t ...
-
Reliability Engineer
6 days ago
Koninklijke DSM N.V. Trenton, United StatesBusiness: · HNC Manufacturing and Technology · Reliability Engineer - Belvidere, NJ - Onsite · Reliability Engineer · Location - Belvidere, NJ · Onsite · As a · Reliability Engineer , you will be responsible for improving the reliability of the plant with regard to equipment, ...
-
Reliability Engineer
1 week ago
Kearfott Trenton, United StatesFounded in 1918, Kearfott Corporation, a global Aerospace and Defense supplier for over 100 years, is a leader in the design and manufacture of precision motion control products and inertial navigation components. Kearfott has a very long history of innovation and excellence, and ...
-
Senior Site Reliability Engineer
1 week ago
Tandigm Health West Conshohocken, United StatesSenior Site Reliability Engineer (hybrid) · Tandigm Health, a transformational leader in population health management, is looking for a Senior Site Reliability Engineer in the greater Philadelphia area. We offer a competitive compensation and benefits package and are proud to sha ...
-
Regional Reliability Engineer
1 week ago
Covanta Chester, United States Full timeWho we are · For more than 40 years, Covanta has been at the forefront of sustainable materials management, providing companies and communities world-class waste and resource solutions. · Through our diverse and scalable full-service capabilities, we're leading the charge to a c ...
-
qlty & reliability engineer
2 weeks ago
TE Connectivity Berwyn, United StatesSelect how often (in days) to receive an alert: · TE Connectivity in Berwyn, PA is looking for a · Quality & Reliability Engineer · to · Recommend and implement process improvements and modifications. Support the implementation and training of quality standards and improvement ...
-
Site Reliability Engineer
2 weeks ago
Tekgence Inc Wilmington, United StatesJob Description · Formal training or certification on site reliability engineering concepts and 3+ years applied experience · Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform · Proficien ...
-
Reliability Engineer
3 weeks ago
Merck & Company, Inc. Lansdale, United StatesAre you looking for an exciting opportunity with a rapidly growing pharmaceutical manufacturer? If so, then this is the opportunity for you As a member of the Engineering Maintenance & Utilities (EMU) Reliability Engineering Group, the primary role Reliability Engineer, Liability ...
-
Maintenance Reliability Engineer
3 weeks ago
New Life Staffing, LLC Elverson, United StatesResponsible for maintenance engineering activities to maximize safety, quality and productivity in the daily maintenance of plant equipment in order to drive continuous improvements to equipment availability and throughput. · ESSENTIAL FUNCTIONS · Responsible for safety design mo ...
-
Site Reliability Engineering
3 weeks ago
Forhyre Philadelphia, United StatesJob Description · Job DescriptionForhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas of development and are interested in continuing to improve our platform through the ever-changing technology landscape. · To be successful in thi ...
-
Lead Reliability Engineer
3 weeks ago
AdvanSix Philadelphia, United StatesAdvanSix plays a critical role in global supply chains, innovating and delivering essential products for our customers in a wide variety of end markets and applications that touch people's lives, such as building and construction, fertilizers, plastics, solvents, packaging, paint ...
-
Site Reliability Engineer
1 week ago
Syntricate Technologies Malvern, United StatesJob Title: Site Reliability Engineer · Locaton: Malvern, PA · Duration: Contract · Job Description: · Ansible Production Elevations support [ this includes right from reviewing bitbucket configs pull request changes, patching Ansible prod job templates & monitoring prod elevatio ...
-
Site Reliability Engineer
3 weeks ago
Ccube Wilmington, United StatesJob Description · Job DescriptionJob Title: SRE Engineer · Location: Wilmington, DE – Only Local · Job DescriptionFormal training or certification on site reliability engineering concepts and 3+ years applied experience · Proficient in site reliability culture and principles and ...
-
Site Reliable Engineer
6 days ago
Diverse Lynx Malvern, United Statesnsible Production Elevations support [ this includes right from reviewing bitbucket configs pull request changes, patching Ansible prod job templates & monitoring prod elevation status and helping with prod deployment failures ]Provide on-call support New Provisioning Requests fo ...
-
Site Reliability Engineer
22 hours ago
The Dignify Solutions LLC Malvern, United StatesDemonstrable experience on Java (JDK 8+) and Microservices architecture · Hand-on experience in AWS services (EC2, ECS, S3, Cloud Formation template, Aurora DB, Dynamo DB. Lambda, SQS, SNS, RDS, API Gateway, VPC, Route 53, Kinesis, Cloudwatch AWS SDK) · Experience with monitori ...
-
Site Reliability Engineer
1 week ago
Infinity Consulting Solutions Wilmington, United StatesWe have partnered with our client in search of an Application Support Engineer. · Application Support Roles & Responsibilities: · Application monitoring infrastructure using Splunk or Dynatrace, servers, databases, distributed batch jobs and supporting sustained resiliency, dis ...
-
Engineer, Equipment Reliability Specialist
1 week ago
Constellation Energy Pottstown, United StatesCOMPANY OVERVIEW · As the nation's largest producer of clean, carbon-free energy, Constellation is a company purposely-built to meet the challenges of the climate crisis. Constellation has been the leader in clean energy production for more than a decade and we are growing our co ...
-
Reliability Engineer
2 weeks ago
Lockheed Martin Corporation Medford, United StatesJob ID: 667052BR · Date posted: May. 28, 2024 · Description:This role is for a Facilities Reliability Lead Specialist whose primary responsibilities will be to implement programs that institute asset reliability, reduce operational risk and drive lower costs within the campus, ...
Site Reliability Engineer - Plymouth Meeting, United States - RIT Solutions, Inc.
Description
Remote- EST HoursNeed LinkedIn with profile picture and strong connections and profile should be old not the newly created.
Responsibilities: