Service Reliability Engineer - Austin, United States - EOS USA

    Default job background
    Description
    WHO WE ARE:

    EOS IT Solutions is a Global Technology and Logistics company, providing Collaboration and Business IT Support services to some of the world's largest industry leaders, delivering forward-thinking solutions based on multi-domain architecture. Customer satisfaction and commitment to superior quality of service are our top business priorities, along with investing in and supporting our partners and employees.

    We are a true International IT provider and are proud to deliver our services through global simplicity with trusted transparency.

    POSITION OVERVIEW:

    EOS IT Solutions is seeking a technically proficient Service Reliability Engineer to join our managed services infrastructure engineering team, supporting advanced collaboration technologies in a fast paced and industry leading environment. The ideal candidate is a highly motivated technical enthusiast with a strong foundation in IT, networking, and collaboration technologies, and a passion for continuous learning.

    WHAT YOU'LL DO:
    • Troubleshoot and resolve technical issues related to collaboration technologies, networking, and infrastructure, utilizing advanced diagnostic tools and techniques
    • Perform routine maintenance, upgrades, and patching on collaboration systems, network infrastructure, and associated hardware and software components
    • Contribute to the development and implementation of automation solutions using scripting languages like Bash, Python, and industry-standard frameworks to streamline infrastructure management tasks
    • Monitor system performance, perform capacity planning, and optimize infrastructure for maximum efficiency and reliability
    • Ensure the security and compliance of collaboration systems by implementing and maintaining industry best practices and standards
    • Work closely with the Service Delivery Manager and other team members to provide timely technical support to clients, ensuring high-quality service and adherence to SLAs
    • Participate in cross-functional projects and collaborate with other teams, such as network, security, and cloud teams, to ensure seamless integration of collaboration solutions
    • Maintain up-to-date documentation of technical processes, systems, configurations, and network topologies
    • Efficiently handle live production incidents, debug/troubleshoot application and infrastructure issues, follow and implement SRE best practices
    • Provide On-Call support as an escalation point for VC infrastructure and network troubleshooting
    • Build end-to-end monitoring infrastructure (Logging, Metrics, Tracing) and work closely with the other Production Engineers to provide the right tooling to measure the reliability of our systems
    • Collaborate with development and operations teams to ensure availability and reliability of the application and infrastructure
    • Work closely with software engineers and QAs to ensure the system is responding properly to non-functional requirements such as performance, security, and availability
    • Perform testing and quality assurance around software and hardware used in our environment
    WHAT YOU NEED TO SUCCEED:
    • At least 3 years of prior demonstrated experience in a Site Reliability Engineering, DevOps, or an Infrastructure-focused role.
    • Linux expertise
    • Support of internet-facing production services and distributed systems via deployments, onCall and Incident Management.
    • Proficiency in implementing and coordinating telemetry using monitoring and observability tools like Splunk, Grafana, and Prometheus, or similar.
    • Experience in solving and resolving issues in VMware from both an operating system and application perspective.
    • Understanding of ITIL processes, service management principles, and IT service delivery best practices
    • Building and operating container orchestrating systems like VMware.
    • Designing, building and maintaining infrastructure with a cloud provider such as AWS.
    • Automation advocate - prior history of removing operational toil via software.
    • Self motivated, inquisitive and always looking to learn more.
    • Familiarity with scripting languages like Bash, Python, or similar, and experience with REST APIs
    • Experience with systems automation tools like Chef, Ansible, Terraform, or similar.
    ADDITIONAL REQUIREMENTS:
    • Strong foundational knowledge in networking protocols, infrastructure, and troubleshooting techniques, including TCP/IP, DNS, DHCP, VLANs, and routing protocols
    • Disaster recovery and capacity planning.
    • Strong communication and interpersonal skills, with the ability to work effectively in a team-oriented environment
    • Self-motivated and eager to learn new technologies, tools, and methodologies
    EOS is committed to creating a diverse and inclusive work environment and is proud to be an equal opportunity employer. We invite you to consider opportunities at EOS regardless of your gender; gender identity; gender reassignment; age; religious or similar philosophical belief; race; national origin; political opinion; sexual orientation; disability; marital or civil partnership status or other non-merit factor.

    Pay Range

    $100,000-$115,000 USD