Site Reliability Engineer - Chicago, United States - Cloudious LLC

    Default job background
    Description

    Job Description

    Job DescriptionAs a Site Reliability Engineer in Wealth ManagementROLE:
    This role will be Responsible for application observability, maintenance, and support, identifying and
    implementing preventive measures proactively, evaluates and makes recommendation on techniques,
    practices, or technologies that would enhance business needs. As a SRE associate you will collaborate
    with Application Support and Development teams to implement business solution through agile practice
    and manage production issues.
    The ideal candidates will possess excellent leadership and communication skills coupled with a solid
    understanding of modern cloud technologies preferably in the financial sector


    PRINCIPAL RESPONSIBILITIES:
    Lead production stability effort by preventing production issue and improve production
    stability.
    Defining and enforcing Service Level Objectives (SLOs) and Agreements (SLAs), Error Budgets to
    guarantee system reliability and availability.
    Attention to key performance indicators, such as response times, error rates, and uptime to align
    operational performance with overarching business objectives.
    Proactively identify continuous improvement opportunities such as reducing manual effort,
    automation of tasks/resolutions or decreasing production incidents.
    Involving in defining and deploying monitoring, metrics, and logging systems and developing
    application dashboards.
    Ensure near-zero downtime with monitoring and alerting, self-healing automation, and
    continuous improvement.
    Provide reactive, break-fix support and Communicate issue/resolution status (written and
    verbal) to project team and management.
    Develop to become a Subject Matter Expert for assigned application domain.
    Should be able to interpret the alerts like SiteScope, Dynatrace and ELK etc. & refer to it while
    doing the RCA of the issue.
    He Should be flexible for upskilling to new tech stack & should be ready to do hands on
    development.
    Provide regular and high-quality updates to all the stakeholders on the progress of the work
    including user stories and ITSM problems.
    Attend regular meetings with Project/Development teams to discuss production issues for
    prioritization, fixes, and release

    SKILLS / EXPERIENCE:
    5-6 plus years of application development experience using modern technologies and
    architecture, including experience collaborating with technology teams.
    2 plus years of Site Reliability Engineering experience.
    Good Understanding of at least one public cloud, preferably Microsoft Azure/ Pivotal Cloud
    Foundry.
    Strong understanding of REST APIs and how to use them in practice.
    Strong Experience with continuous integration and collaboration tools like Azure DevOps, JIRA,
    Bitbucket, GitHub and Confluence.
    Good knowledge and Hands on CLI Bash, Linux, Azure CLI etc.,
    Experience in some of the following technologies: Java, J2EE, Pivotal Cloud Foundry, Cloud
    Computing (IaaS, PaaS, and SaaS), RESTful interfaces, GIT, Gradle, Maven, NPM, Spring (Spring
    Batch and Spring Boot), CSS3, HTML4, , , Oracle PL/SQL, and Kafka.


    NTAC:
    3NS-20
    Strong communication and interpersonal skills, along with a solid technical background are
    essential as is the ability to multitask in a fast-paced environment. The ideal candidate can
    explain technical issues in layman's terms and translate business needs to technology teams and
    back.
    Experience working effectively with diverse groups around the world, including IT management,
    technology staff, business partners, consultants, vendors, and clients