- Design, build, implement, and maintain platform tooling that improves reliability across the entire product surface area, to improve the availability, scalability, latency, and efficiency of services
- Manage end-to-end distributed systems availability and ensure high-performance of applications
- Build automation solutions to prevent problem recurrence
- Build visibility into SLIs, SLOs, SLAs, and dependency metrics to manage operational burden and systems reporting
- Design, build, implement, and maintain observability ecosystem to provide visibility across the platform services and applications
- Proactively identify risks and develop engineering processes and/or tooling to reduce availability risk
- Evangelize best practices and mentor service owners on reliability, resiliency, and scalability for new and existing services and/or features
- Participate in an on-call rotation and hold retroactive root cause analysis meetings, focusing on identifying remediations and product resiliency opportunities
- At least 7 years of experience working in medium or large scale production systems
- The ability to take a systematic approach to analyzing, troubleshooting, and diagnosing system problems to identify, locate, resolve, and repair problems
- Experience in software development or systems engineering with code
- Experience designing for scale and automation-forward ecosystems and solutions
- Possess a breadth of engineering skills with an interest in service reliability, automation, monitoring, and capacity planning
- Understanding of modern application architecture (e.g. microservices, EDA)
- Experience with APM services and solutions (e.g. Open Telemetry, Honeycomb, New Relic, Dynatrace, AppDynamics, Datadog)
- Experience with time-series observability solutions (e.g. InfluxDB, Prometheus, Grafana)
- Experience with scaled indexed logging solutions (e.g. Splunk, ElasticSearch, OpenSearch)
- Experience running and operating Ruby on Rails applications and infrastructure
- Deep knowledge with major cloud services providers and solutions (Amazon Web Services, Google Cloud Platform, Microsoft Azure)
- Previous experience working within site reliability engineering culture (e.g. improving reliability through systems engineering automation, chaos testing, synthetics, and process improvement)
- Experience designing, building, implementing, and operating distributed systems and cloud infrastructure at scale
- Experience with container computing and container orchestration (e.g. proprietary systems such as Google Kubernetes Engine (GKE), multi-cloud solutions such as Kubernetes, or Nomad)
- Experience with configuration management systems (e.g. Ansible, Puppet, Chef, Saltstack, Consul)
- Experience with virtual networking (e.g. cloud networking, service mesh, SDN)
- Experience in security automation (e.g. cloud proprietary solutions such as Google Secret Manager or Vault)
- Experience with infrastructure-as-code (e.g. Terraform)
- Strong written communication skills
- Ability to work in an asynchronous environment
- Experience in supporting a 24/7 operational infrastructure including on-call rotations
- Must have an obsession for building quality products
- Ability to thrive when there are changing priorities and shifting of gears
- Strong oral and written communication skills
- Must be a team player with a strong, self-managing work ethic
- Must be a self-starter with a passion for platform engineering, learning and continuous improvement
- Ensure observability tooling and integrations are providing telemetry and logging statistics across the entirety of systems and applications
- Enable the Engineering organization the ability to identify and triage operational issues, empowering teams to own and operate autonomously
- Contribute to defining and executing on the Observability Roadmap in maintaining and modernizing cloud-native observability within the organization
- Integrate telemetry and logging frameworks to the cloud platform
- Evaluate new and existing observability technologies to ensure capabilities are inclusive of black box solutions (e.g. COTS) as well as Engineering-created software
- Manage distributed system and application scaling activity directly (as applicable) as well as in an advisory capacity on behalf of Engineering development teams
-
Site Reliability Engineer
1 week ago
Easy Dynamics McLean, United States Full timeWe are looking for a Site Reliability Engineer to join our team and develop software systems and automated solutions for operational aspects in an organization. · Site Reliability Engineer responsibilities include monitoring computer systems and building alerts for various opera ...
-
Site Reliability Engineer
1 week ago
Easy Dynamics McLean, United StatesJob Description · Job DescriptionWe are looking for a Site Reliability Engineer to join our team and develop software systems and automated solutions for operational aspects in an organization. · Site Reliability Engineer responsibilities include monitoring computer systems and ...
-
Sustaining Reliability Engineering Technician
3 weeks ago
Rivian Normal, United StatesAs a Sustaining Reliability Laboratory Technician -Battery working at Rivians Normal Illinois test facility youll work with Sustaining Reliability Team to build, maintain, and operate laboratories for testing high voltage batteries, electric vehicl Reliability Engineer, Technicia ...
-
Manager, Product Management
1 week ago
Capital One McLean, United StatesLocations: VA - Richmond, United States of America, Richmond, VirginiaManager, Product Management - MessageHubOur team supports MessageHub, an enterprise-critical internal platform that enables Capital One lines of business (LOBs) to send billions of messages to our customers and ...
-
IT Program Manager
2 weeks ago
The DarkStar Group McLean, United StatesMcLean, VA · Description · The DarkStar Group is seeking an IT Program Manager with a TS/SCI + Poly clearance (applicable to this customer) to join one of our top projects in McLean, VA. Below is an overview of the project, as well as information on our company, our benefits, and ...
-
Access Control Specialist
3 weeks ago
IAP Worldwide Services, Inc McLean, United States**Current TS/SCI Full Scope Poly required** · As the **Access Control Specialist**, you will be responsible for providing administrative security services to the compound, such as obtaining security clearances on employees and visitors (needed to assist in arranging visit authori ...
-
Health/fitness Trainer
3 weeks ago
IAP Worldwide Services, Inc McLean, United States**Health/Fitness Specialist** · Intrepid Global Solutions, LLC. · **Responsibilities**: · Intrepid Global Solutions is seeking a qualified Full-Time Fitness Professional to develop and conduct physical fitness programs for government employees in the Washington Metro Area (WMA). ...
-
Power Systems Sme
1 week ago
Peraton McLean, United States**About Peraton** · **Responsibilities** · Peraton is seeking an experienced **Power Systems - Subject Matter Expert (SME) **to join our talented team of technical and business experts providing key operational and mission support to a SI/SETA mission for a customer in Virginia. ...
-
Contracts Administrator
3 weeks ago
IAP Worldwide Services, Inc McLean, United StatesAs the **Contract Administrator**, you will be responsible for the administration of Company contracts in accordance with Company Policies and Procedures, and applicable Federal Regulations. You will review and resolve issues affecting Company compliance and ensure satisfaction o ...
-
Communications Sme
1 week ago
Peraton McLean, United States**About Peraton** · **Responsibilities** · Peraton is seeking an experienced **Communications - Subject Matter Expert (SME)** to join our talented team of technical and business experts providing key operational and mission support to a SI/SETA mission for a customer in Virginia. ...
-
Lead - Mainframe Operations (100% Remote)
1 week ago
BAE Systems McLean, United StatesBAE Systems, Inc. is seeking a Night Shift-Lead (12am - 8am ET) to provide support for Java-based mainframe operations for our Government client. This position is remote (telework). This role shall monitor, triage and report on Production activities related to the Internal Revenu ...
-
Director Bd, Infantry Weapons
1 week ago
FN America, LLC McLean, United States**Job Type** · Full-time · **Description** · FN America, LLC is a U.S. subsidiary of FN Herstal, S.A., a global leader in the development and manufacturing of high-quality, reliable firearms for military, law enforcement and commercial customers worldwide. Headquartered in McLean ...
-
Chauffeur/driver W Ts/fsp Conus
5 hours ago
IAP Worldwide Services, Inc McLean, United StatesAs the **Shuttle Driver **you are responsible for the routine transportation of passengers by van or bus to various government and commercial buildings and agencies throughout the Washington, DC, metropolitan area (WMA). Individuals must be in full compliance with the Department ...
-
Atls/mail Courier
6 days ago
IAP Worldwide Services, Inc McLean, United States**Mail Courier**-_(Current TS/SCI Full Scope Polygraph required)_ · **Summary**: · The Mail Courier ensures the secure, accurate, and timely delivery of unclassified and classified mail, distributing internally and externally twice daily for the client mail. · **Available Shift(s ...
-
Data Analytics Professional
2 days ago
Freddie Mac McLean, United StatesAt Freddie Mac, you will do important work to build a better housing finance system and you'll be part of a team helping to make homeownership and rental housing more accessible and affordable across the nation. · **Position Overview**: · Single Family CFO Servicing & REO team is ...
-
Solutions Architect
2 days ago
Syms Strategic Group, LLC (SSG) McLean, United States**Syms Strategic Group (SSG) **is seeking a talented **Solutions Architect** · **Department**:Veterans Affairs · **Location**:On-site, Washington, DC · **Type**:Full Time · **Min. Experience**:Experienced · **Security Clearance Level**:Public Trust (MBI) · **Salary Range**:$124,6 ...
-
Human Resources Business Partner 3
1 week ago
Northrop Grumman McLean, United States**Requisition ID: R **: · - ** Category**: Human Resources · - **Location**: McLean, Virginia, United States of America · - **Clearance Type**: None · - **Telecommute**: Yes-May consider hybrid teleworking for this position · - **Shift**: Not Applicable (United States of America) ...
-
Electrical Engineer, Reliability and Operons
3 weeks ago
ati Brackenridge, United StatesCreating Long-Term Value Thru · Relentless · Innov · ati · on · ATI is a global manufacturer of technically advanced specialty materials and complex components. Our products support diverse markets and industries such as aerospace and defense, oil & gas, electrical energy, m ...
-
Sr. Site Reliability Engineer, Fleetnet
3 weeks ago
Tesla Clinton, United StatesThe Role · The Role · We're the small, expert team creating the next-generation server-side infrastructure to support the growing fleets of Tesla products, and we're looking for seasoned SREs with domain expertise in one or more of: containers, public clouds and cloud-native apps ...
-
Mumps Developer
2 weeks ago
Ascendion McLean, United StatesAbout Ascendion · Ascendion is a full-service digital engineering solutions company. We make and manage software platforms and products that power growth and deliver captivating experiences to consumers and employees. Our engineering, cloud, data, experience design, and talent so ...
Site Reliability Engineer V - McLean, United States - ID
Description
Company Overviewis a high-growth enterprise software company that simplifies how people prove and share their identity online. The company empowers people to control their data through a portable and trusted login, which means they don't need to create a new password when visiting sites that have the button. 's digital identity network has over 117 million registered members, and is used by fourteen federal agencies, agencies in 30 states and over 600 corporations for secure identity proofing and verification.
's technology meets the federal standards for consumer authentication set by the Commerce Department and is approved as a NIST IAL2 / AAL2 credential service provider by the Kantara Initiative. In addition to helping people control their credentials and data, the company's "No Identity Left Behind" initiative strives to expand digital access and inclusion for all people. The company offers multiple pathways to identity verification – online self-serve, live video chat agents, and in person. is passionate about building a robust identity network that does not compromise access for traditionally underserved groups.
has received numerous awards including Deloitte's 2023 Technology Fast 500, Washington Business Journal's Fastest Growing Companies, Entrepreneur Magazine's 100 Brilliant Companies and Wall Street Journal's Startup of the Year finalist. In recent quarters, announced it raised $132 million in Series D funding, led by Viking Global Investors with participation from CapitalG, Morgan Stanley Counterpoint, FTV Capital, PSP Growth, Auctus Investment Group, Moonshots Capital, and Scout Ventures. 's most recent round brings the total investment in to over $275 million since its founding in 2010.
The Site Reliability Engineer V (SRE) will combine software and systems engineering to build and run distributed, fault-tolerant systems at scale. SRE's ensure our services have the appropriate reliability and uptime to protect and promote our customers' experience.
Note that candidates must be located in the Washington DC or San Francisco Bay area as this role requires an onsite presence.
Responsibilities
Ideal Qualifications
Ideal candidate will thrive in the following culture:
Day to Day Life
Note that candidates must be located in the Washington DC or San Francisco Bay area as this role requires an onsite presence.
At , we believe that an in-office culture fosters professional growth and development, mentorship, collaboration, and accelerated innovation. This position will be in-office based at one of our locations in either McLean, VA or Sunnyvale, CA. Working in an office together allows our culture to thrive and our team members to establish real connections with their coworkers and the opportunity for lifelong friendships. Our work is critical to protecting online identity and we're confident that working together is how we'll change the world.
The annual base salary listed below for this role is based on experience, skills, education, relevant training and geographic location. Company bonus, incentive for sales roles, equity, and benefits are available depending on the role.
offers comprehensive medical, dental, vision, health savings account, flexible spending accounts (medical, limited purpose, dependent care, commuter benefit accounts), basic and voluntary life and AD&D insurance, 401(k) with company match, parental leave, ability to participate in unlimited paid time off subject to the terms and conditions of the PTO policy, including 8 company wide holidays, short and long-term disability insurance, accident and critical illness insurance, referral bonus policy, employee assistance program, pet insurance, travel assistant program, wellbeing and childcare discounts, benefit advocates, and a learning and development benefit.
The above represents the anticipated total rewards package for this job requisition. Final offers may vary from the amount listed based on qualifications, professional experiences, skills, education, relevant training, geographic location, and other job related factors.
Pay Range
$181,488—$210,000 USD
maintains a work environment free from discrimination, where employees are treated with dignity and respect. All employees share in the responsibility for fulfilling our commitment to equal employment opportunity. does not discriminate against any employee or applicant on the basis of age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. adheres to these principles in all aspects of employment, including recruitment, hiring, training, compensation, promotion, benefits, social and recreational programs, and discipline. In addition, 's policy is to provide reasonable accommodation to qualified employees who have protected disabilities to the extent required by applicable laws, regulations and ordinances where a particular employee works. Upon request we will provide you with more information about such accommodations.
Please review our Privacy Policy, including our CCPA policy, at If you provide with any personally identifiable information you confirm that you have read and agree to be bound by the terms and conditions set out in our Privacy Policy.
participates in E-Verify.