Site Reliability Engineer - Tampa, United States - IdelSoft

    IdelSoft
    IdelSoft Tampa, United States

    2 weeks ago

    Default job background
    Description
    About the job Site Reliability Engineer (Java)

    This is a 1099-based role. Eligible for US-based candidates with a valid work authorization only.

    Seeking highly skilled and experienced Site Reliability Engineer (SRE) focusing on reliability, performance, and efficiency of trading enclave products/services.


    Key Responsibilities:

    • Lead the design, implementation, and management of SRE strategies for high availability, performance, and scalability.
    • Spearhead incident management processes, including response, postmortem analysis, and preventive measures.
    • Utilize advanced monitoring tools (Dynatrace, Splunk, Grafana) to proactively identify and troubleshoot system issues.
    • Conduct thorough root cause analysis (RCA) on incidents and errors, developing longterm solutions.
    • Collaborate closely with the development team to integrate SRE best practices into the software development lifecycle.
    • Serve as a Java Subject Matter Expert (SME), guiding the development of robust, scalable, and efficient applications.
    • Develop and maintain CI/CD pipelines, leveraging DevOps tools and methodologies.
    • Mentor junior SRE and development team members, fostering a culture of continuous learning and improvement.

    Qualifications:


    • Bachelor's degree in Computer Science, Engineering, or related field. Advanced degree preferred.
    • 5+ years of experience in site reliability engineering, DevOps, or similar roles, with a focus on trading systems or financial services.
    • Proficient in monitoring and observability tools (Dynatrace, Splunk, Grafana).
    • Strong background in software development, particularly in Spring Boot, Core Java, and Java Design Patterns.
    • Expertise in incident management, root cause analysis, and troubleshooting complex systems.
    • Solid understanding of CI/CD principles, infrastructure as code, and automation tools.
    • Strong problemsolving skills and the ability to work independently or as part of a team.
    • Strong communication and leadership skills, with the ability to mentor and guide team members.
    • Expertise in automation and scripting languages (e.g., Python, Shell).
    • Solid understanding of onprem and hybrid cloud infrastructure (VMware, Linux, Windows, Azure) and container orchestration (Kubernetes, Docker).
    • Fairly good understanding of MongoDB, Kafka, and IBM mainframe DB2 (preferred).
    • Conversant with WebLogic, Java technology stacks including Spring Boot (Not expert level skillset).
    • Certifications in relevant technologies (Dynatrace, Splunk) are a plus.