No more applications are being accepted for this job
- Troubleshoot and resolve technical issues related to collaboration technologies, networking, and infrastructure, utilizing advanced diagnostic tools and techniques
- Perform routine maintenance, upgrades, and patching on collaboration systems, network infrastructure, and associated hardware and software components
- Contribute to the development and implementation of automation solutions using scripting languages like Bash, Python, and industry-standard frameworks to streamline infrastructure management tasks
- Monitor system performance, perform capacity planning, and optimize infrastructure for maximum efficiency and reliability
- Ensure the security and compliance of collaboration systems by implementing and maintaining industry best practices and standards
- Work closely with the Service Delivery Manager and other team members to provide timely technical support to clients, ensuring high-quality service and adherence to SLAs
- Participate in cross-functional projects and collaborate with other teams, such as network, security, and cloud teams, to ensure seamless integration of collaboration solutions
- Maintain up-to-date documentation of technical processes, systems, configurations, and network topologies
- Efficiently handle live production incidents, debug/troubleshoot application and infrastructure issues, follow and implement SRE best practices
- Provide On-Call support as an escalation point for VC infrastructure and network troubleshooting
- Build end-to-end monitoring infrastructure (Logging, Metrics, Tracing) and work closely with the other Production Engineers to provide the right tooling to measure the reliability of our systems
- Collaborate with development and operations teams to ensure availability and reliability of the application and infrastructure
- Work closely with software engineers and QAs to ensure the system is responding properly to non-functional requirements such as performance, security, and availability
- Perform testing and quality assurance around software and hardware used in our environment
- At least 3 years of prior demonstrated experience in a Site Reliability Engineering, DevOps, or an Infrastructure-focused role.
- Linux expertise
- Support of internet-facing production services and distributed systems via deployments, onCall and Incident Management.
- Proficiency in implementing and coordinating telemetry using monitoring and observability tools like Splunk, Grafana, and Prometheus, or similar.
- Experience in solving and resolving issues in VMware from both an operating system and application perspective.
- Understanding of ITIL processes, service management principles, and IT service delivery best practices
- Building and operating container orchestrating systems like VMware.
- Designing, building and maintaining infrastructure with a cloud provider such as AWS.
- Automation advocate - prior history of removing operational toil via software.
- Self motivated, inquisitive and always looking to learn more.
- Familiarity with scripting languages like Bash, Python, or similar, and experience with REST APIs
- Experience with systems automation tools like Chef, Ansible, Terraform, or similar.
- Strong foundational knowledge in networking protocols, infrastructure, and troubleshooting techniques, including TCP/IP, DNS, DHCP, VLANs, and routing protocols
- Disaster recovery and capacity planning.
- Strong communication and interpersonal skills, with the ability to work effectively in a team-oriented environment
- Self-motivated and eager to learn new technologies, tools, and methodologies
Service Reliability Engineer - Austin, United States - EOS USA
Description
WHO WE ARE:EOS IT Solutions is a Global Technology and Logistics company, providing Collaboration and Business IT Support services to some of the world's largest industry leaders, delivering forward-thinking solutions based on multi-domain architecture. Customer satisfaction and commitment to superior quality of service are our top business priorities, along with investing in and supporting our partners and employees.
We are a true International IT provider and are proud to deliver our services through global simplicity with trusted transparency.
POSITION OVERVIEW:
EOS IT Solutions is seeking a technically proficient Service Reliability Engineer to join our managed services infrastructure engineering team, supporting advanced collaboration technologies in a fast paced and industry leading environment. The ideal candidate is a highly motivated technical enthusiast with a strong foundation in IT, networking, and collaboration technologies, and a passion for continuous learning.
WHAT YOU'LL DO:
Pay Range
$100,000-$115,000 USD