No more applications are being accepted for this job
- Lead the design, implementation, and management of SRE strategies for high availability, performance, and scalability.
- Spearhead incident management processes, including response, postmortem analysis, and preventive measures.
- Utilize advanced monitoring tools (Dynatrace, Splunk, Grafana) to proactively identify and troubleshoot system issues.
- Conduct thorough root cause analysis (RCA) on incidents and errors, developing longterm solutions.
- Collaborate closely with the development team to integrate SRE best practices into the software development lifecycle.
- Serve as a Java Subject Matter Expert (SME), guiding the development of robust, scalable, and efficient applications.
- Develop and maintain CI/CD pipelines, leveraging DevOps tools and methodologies.
- Mentor junior SRE and development team members, fostering a culture of continuous learning and improvement.
- Bachelor's degree in Computer Science, Engineering, or related field. Advanced degree preferred.
- 5+ years of experience in site reliability engineering, DevOps, or similar roles, with a focus on trading systems or financial services.
- Proficient in monitoring and observability tools (Dynatrace, Splunk, Grafana).
- Strong background in software development, particularly in Spring Boot, Core Java, and Java Design Patterns.
- Expertise in incident management, root cause analysis, and troubleshooting complex systems.
- Solid understanding of CI/CD principles, infrastructure as code, and automation tools.
- Strong problemsolving skills and the ability to work independently or as part of a team.
- Strong communication and leadership skills, with the ability to mentor and guide team members.
- Expertise in automation and scripting languages (e.g., Python, Shell).
- Solid understanding of onprem and hybrid cloud infrastructure (VMware, Linux, Windows, Azure) and container orchestration (Kubernetes, Docker).
- Fairly good understanding of MongoDB, Kafka, and IBM mainframe DB2 (preferred).
- Conversant with WebLogic, Java technology stacks including Spring Boot (Not expert level skillset).
- Certifications in relevant technologies (Dynatrace, Splunk) are a plus.
Site Reliability Engineer - Tampa, United States - IdelSoft
Description
About the job Site Reliability Engineer (Java)This is a 1099-based role. Eligible for US-based candidates with a valid work authorization only.
Seeking highly skilled and experienced Site Reliability Engineer (SRE) focusing on reliability, performance, and efficiency of trading enclave products/services.
Key Responsibilities:
Qualifications: