Site Reliability Engineer - Springfield, United States - Azimuth

Azimuth Springfield, United States

3 weeks ago

Description

Azimuth Corporation is seeking a qualified Site Reliability Engineer to support an NGA Research customer on-site in the Springfield, VA area. The ideal candidate will provide expert support to the Labs and Data pod.

Is your CV ready If so, and you are confident this is the role for you, make sure to apply asap.

This team works to maximize the mission utility of increasingly diverse visual data sources with automated and scalable computational methods. It employs rigorous testing and evaluation for optimal alignment of data, technology and tradecraft by monitoring and instrumentation: implementing metrics in Prometheus, Grafana, log management and related system, and Slack/PagerDuty integrations. Engineering practices: availability, reliability and scalability, as well as disaster recovery. Use and contribute to code to GitLab.

Execution

Identifies significant projects that result in substantial improvements in reliability, cost savings and/or revenue.
Identifies changes for the product architecture from the reliability, performance and availability perspectives with a data driven approach.
Influences the product roadmap and works with engineering and product counterparts to influence improved resiliency and reliability of the GitLab product.
Proactively work on the efficiency and capacity planning to set clear requirements and reduce the system resources usage to make GitLab cheaper to run for all our customers.
Identify parts of the system that do not scale, provides immediate palliative measures and drives long term resolution of these incidents.
Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives.

Requirements for this role include, but are not limited to, the following:

Role: Owns RLEWAN & HARLEM availability, feature performance, and their deployments

Technical

General knowledge of most technical expertise areas, with deep knowledge in 2.
Advanced Chef (syntax, recipes, cookbooks) and Ansible (syntax, tasks, playbooks)
Advanced Terraform syntax and GitLab CI/CD configuration, pipelines, jobs
Advanced knowledge of cloud services
Kubernetes: cluster provisioning and new services
Prometheus, Thanos, and Grafana: service catalog metrics and recording rules for alerts
Log shipping pipelines and incident debugging visualizations
Operating system (Linux) configuration, package management, startup and troubleshooting
Block and object storage configuration and debugging
Working knowledge of overall GitLab Product, including deep knowledge of groups which may be part of stable counterpart assignments.
Contributes improvements to the GitLab codebase to resolve issues.
Must haves:

submitted 1 approved swap

submitted 1 approved SIDR

submitted 2 service+ tickets

Minimally worked 1 UC2S/SC2S and 1 TC2S FCR from beginning to end

Minimally worked 1 ATO process with RLE Security, DAOR, and SCA

Minimally 3 years linux (centos/redhat 7) O.S. administration

Minimally 2 years baremetal admin

Minimally 1 years git cli

Experience with Gitlab, and can use Gitlab web ui for issues, code review

Experience with Ansible, ThreadFix, Terraform and Docker.

can use threadfix web ui for vuln finding workoff

can use gitlab ci pipelines

1 yr ansible

1 yr terraform

1 yr docker engine install/admin

1 yr rle hpc cli tools

Good to know one of the referenced languages: Python, Java, C/C++, Ruby, Shell and JavaScript.

Experience with distributed storage technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn)

Previous success in technical engineering

Coding experience beyond simple scripts

access/privs

SCI clearance

PRGMADMIN to U|S|T|C2S rle accounts

sbu/secnet/coe access

rle-u|sc|tc admin access

rle-u|sc|tc HPC admin access

NCE data center access

Responsibilities for this role include, but are not limited to, the following:

Implement and maintain RLEWAN Monitoring App plugins monitoring HPC and other system on prem resources
Coordinate and integration with broader NGA Enterprise continual monitoring services

implement and maintain HARLEM, MIQ, RLEWAN Monitoring App on-prem components as designated by the Product Owner

Work with Security Engineer to provide and resolve on-premises related security relevant technical details during ATO of HARLEM, RLEWAN Monitoring Application, Kubernetes and their sub components
Interact with team using designated GEOINT Services tooling as designated by the Product Owner
Record status updates daily in designated GEOINT Services tooling designated by the Product Owner
Sustain/improve existing HARLEM, MIQ, RLEWAN Monitoring App CI/CD pipelines

use existing UC2S CI/CD pipelines as a guide to implementing like configuration on other domains/environments as designated by the Product Owner

Attend weekly team standup and monthly briefing with team leadership

Ensure all work follows;
Work with and provide requested status to Task Coordinator

Requirements:

Must be able to work in an office environment at a desk and computer.

Company Overview

Azimuth is an award-winning Woman Owned Small Business specializing in providing research and development and professional services support to the federal government. Azimuth's agility, customer driven approach and our commitment to our employees allows us to meet and exceed our client goals. Excellence, Integrity, Accountability, Community and Humility are the core values of Azimuth as we continue to strive as a recognized leader in the management consulting community that both federal agencies and industry partners value doing business with. We are an organization that offers both our employees and clients an exceptional experience; our culture will be contagious, while always maintaining a genuine reputation.

Disclaimer

The above information on this description has been designed to indicate the general nature and level of work performed by employees within this classification. It is not designed to contain or be interpreted as a comprehensive inventory of all duties, responsibilities, and qualifications required of employees assigned to this job.

Azimuth Corporation does not discriminate in employment on the basis of race, color, religion, sex (including pregnancy and gender identity), national origin, political affiliation, sexual orientation, marital status, disability, genetic information, age, membership in an employee organization, retaliation, parental status, military service, or other non-merit factor.

Site Reliability Engineer

4 weeks ago

SoftwareONE Chicopee, United States

Job Function: · Software & Cloud Services · The role: · Als Site Reliability Engineer (SRE) bist du fr die kontinuierliche Verbesserung und Optimierung unserer Systeme und Dienste verantwortlich, um hohe Verfgbarkeit, Skalierbarkeit, Leistung und Sicherheit zu gewhrleisten. · ...
Reliability Engineer

2 days ago

Apple Hartford, United States

Do you ever wonder what goes into making Apple products an amazing user experience?Apple's innovative reliability team is responsible for insuring that our products exceed our customer's expectations for robustness, durability, and reliability. We are looking for motivated contri ...
Reliability Controls Engineer Senior Lead

3 weeks ago

Rose International Springfield, United States

Date Posted: 05/13/2024 · Hiring Organization: Rose International · Position Number: 463934 · Job Title: Reliability Controls Engineer Senior Lead · Job Location: Springfield, OH, USA, 45501 · Work Model: Onsite · Shift: · 1st shift 6:00 AM-2:30 PM · 2nd shift 2:30 PM-11: ...
Reliability Controls Engineer Senior Lead

3 weeks ago

Infotree Global Solutions Springfield, United States

1st shift: 6:00 AM to 2:30 PM · Submit your CV and any additional required information after you have read this description by clicking on the application button. · 2nd shift: 2:30 PM to 11:00 PM. · Responsibilities: · • Oversees conducting failure mode and effects analysis ( ...
Site Reliability Engineer

2 weeks ago

CVS Pharmacy Windsor Locks, United States

Bring your heart to CVS Health. Every one of us at CVS Health shares a single, clear purpose: Bringing our heart to every moment of your health. This purpose guides our commitment to deliver enhanced human-centric health care for a rapidly changing w Reliability Engineer, Liabili ...
Reliability Engineering Coach

3 weeks ago

The Hartford Hartford, United States

Staff Reliability Engineer - IE07KE · We're determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals - and to help others accomplish theirs, too. Join ...
Site Reliability Engineer

4 weeks ago

Diverse Lynx Hartford, United States

Site Reliability Engineer (with strong knowledge in Advanced Java, Splunk, Dynatrace and AWS. ) · Experience: 10+ years · Location: Hartford - CT · *ONSITE* · Top skills required for this role: · Core Java 1.8 · + is mandatory. · Strong experience in · REST API, SOAP Services ...
Senior Database Reliability Engineer

6 days ago

The Hartford Hartford, United States Full time

Sr Database Mgr - IB07EE We're determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals – and to help others accomplish theirs, too. Join our team as we ...
AVP Reliability Engineering Squad

3 weeks ago

The Hartford Windsor Locks, United States

AVP & Reliability Engineering - IE05HE · We're determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals - and to help others accomplish theirs, too. Jo ...
AVP Reliability Engineering Squad

4 weeks ago

The Hartford Windsor Locks, United States

AVP & Reliability Engineering - IE05HEWe're determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals - and to help others accomplish theirs, too. Join o ...
Site Reliability Engineer @Hartford

1 week ago

Diverse Lynx Windsor Locks, United States

Job title: Site Reliability Engineer (with strong knowledge in Advanced Java, Splunk, Dynatrace and AWS. ) · Experience: 10+ years · Location: Hartford - CT · Top skills required for this role: · Core Java 1.8 + is mandatory. · Strong experience in REST API, SOAP Services and con ...
Service (site) reliability engineer

1 week ago

Lynx Technology Partners Inc Hartford, United States

Job Description · Service (site) reliability engineering is a software engineering approach to IT operations. Our client is passionate about building software that solves problems; where their SRE team will develop and use software as a tool to manage our systems, solve problems ...
Senior Reliability Engineer

4 weeks ago

Raytheon Technologies Corporation Windsor Locks, United States

The successful candidate must be familiar with probabilistic risk assessment (PRA) including fault tree development and quantification, reliability engineering methods (including time-dependent reliability predictions, allocations, and growth), failu Reliability Engineer, Electri ...
Principal Site Reliability Engineer

5 hours ago

Oracle Hartford, United States

Job Description · Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop d ...
Safety and Reliability Engineer

4 weeks ago

Raytheon Technologies Corporation Windsor Locks, United States

Perform safety assessments in accordance with ARP 4761 or MIL-STD-882 E including but not limited to:System Safety Program Plans (SSPP)Functional Hazard Assessments (FHA)Preliminary System Safety Assessments (PSSA)System Safety Assessments (SSA)Fault Reliability Engineer, Liabili ...
Site Reliability Engineer

1 week ago

eTeam Windsor, United States

Experience level · 10+ yrs · Responsible for managing and supporting the enterprise virtual environment which includes: · Improving patch and vulnerability compliance across entire virtual desktop ecosystem · Improving operational support and requests for virtual systems · Over ...
Site Reliability Engineer

4 weeks ago

eTeam Windsor, United States

Experience level 10+ yrs · Responsible for managing and supporting the enterprise virtual environment which includes: Improving patch and vulnerability compliance across entire virtual desktop ecosystem · Improving operational support and requests for virtual systems · Overs ...
Head of Reliability Engineering and Automation, Commercial

2 weeks ago

The Hartford Financial Services Group, Inc. Hartford, United States

Responsible for building reliability engineering and automation capabilities across Commercial and Personal Lines as part of our Cloud Modernization. Accountable for RE, Dev. Sec. Ops and Run. Ops automation capabilities leveraging the necessary tool Reliability Engineer, Technol ...
AVP, Cloud Reliability Engineering Lead

2 weeks ago

The Hartford Hartford, United States

AVP & Reliability Engineering - IE05HE · We're determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals - and to help others accomplish theirs, too. Jo ...
AVP Reliability Engineering Squad

3 weeks ago

The Hartford Hartford, United States

AVP Reliability Engineering Squad page is loaded · AVP Reliability Engineering Squad · Apply · remote type · 100% Remote · locations · Hartford, CT · time type · Full time · posted on · Posted 2 Days Ago · job requisition id · R · AVP & Reliability Engineering - IE05 ...

Site Reliability Engineer - Springfield, United States - Azimuth

Description

Site Reliability Engineer

Reliability Engineer

Reliability Controls Engineer Senior Lead

Reliability Controls Engineer Senior Lead

Site Reliability Engineer

Reliability Engineering Coach

Site Reliability Engineer

Senior Database Reliability Engineer

AVP Reliability Engineering Squad

AVP Reliability Engineering Squad

Site Reliability Engineer @Hartford

Service (site) reliability engineer

Senior Reliability Engineer

Principal Site Reliability Engineer

Safety and Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Head of Reliability Engineering and Automation, Commercial

AVP, Cloud Reliability Engineering Lead

AVP Reliability Engineering Squad

for Recruiters

Information