Site Reliability Engineer - Waterloo, United States - NCR

    NCR
    Default job background
    Description
    Site Reliability Engineer (AWS) page is loaded

    Site Reliability Engineer (AWS)

    Apply

    locations

    WATERLOO, CAN

    time type

    Full time

    posted on

    Posted 4 Days Ago

    job requisition id

    R

    About NCR


    VOYIX
    NCR
    VOYIX Corporation (


    NYSE:
    VYX) is a leading global provider of digital commerce solutions for the retail, restaurant and banking industries. NCR
    VOYIX is headquartered in Atlanta, Georgia, with approximately 16,000 employees in 35 countries across the globe.

    For nearly 140 years, we have been the global leader in consumer transaction technologies, turning everyday consumer interactions into meaningful moments.

    Today, NCR
    VOYIX transforms the stores, restaurants and digital banking experiences with cloud-based, platform-led SaaS and services capabilities.

    Not only are we the leader in the market segments we serve and the technology we deliver, but we create exceptional consumer experiences in partnership with the world's leading retailers, restaurants and financial institutions.

    We leverage our expertise, R&D capabilities and unique platform to help navigate, simplify and run our customers' technology systems.
    Our customers are at the center of everything we do.

    Our mission is to enable stores, restaurants and financial institutions to exceed their goals – from customer satisfaction to revenue growth, to operational excellence, to reduced costs and profit growth.

    Our solutions empower our customers to succeed in today's competitive landscape.
    Our unique perspective brings innovative, industry-leading tech to all the moving parts of business across industries.

    NCR
    VOYIX has earned the trust of businesses large and small — from the best-known brands around the world to your local favorite around the corner.


    TITLE :
    Site Reliability Engineer

    Job Role:
    We are looking for a

    Site

    Reliability Engineer

    (SRE) who will be part of our SRE team and help build scalable systems, using best practices around automation, that improve reliability, velocity and enable monitoring of the operational health of stacks throughout their life-cycle including metrics collection, aggregation, and visualization.

    As a member of the SRE team you will support NCR's Financial Services business unit, product and technology teams to improve the design and operation of systems, focusing on making them scalable, reliable, and efficient while ensuring performance and high availability of products/services primarily residing in the cloud.

    You will influence the development and implementation of reliable production systems and services to address emerging business needs (such as Cloud-based SaaS).

    SRE's pride themselves on the resiliency and stability of production systems, yet at the same time are committed to innovation and operational improvement through the application of software engineering practices to operations.

    The SRE will facilitate innovation and operational improvement through the application of software engineering practices to operations.

    You will make our products easier to adopt and use by making improvements to the product, tools, processes and documentation.

    You are someone who strives for six 9's or better for service availability

    Job Description:
    You will be responsible for maintaining and scaling production services and servers for complex and high throughput cloud services.
    You will bridge and own the union between development, quality, security and operations.
    You will improve scalability, service reliability, capacity, and performance.
    You will write automation code for provisioning and operating infrastructure at massive scale.
    You are not an operator, you're an experienced software engineer focused on operations.

    You will initiate and contribute to continuous improvement of our software delivery processes and practices in a multi-location, multidisciplinary team to empower and accelerate product development.

    You will use automation extensively to design, configure, manage, and monitor systems in support of our product development teams.
    You will participate in disaster recovery planning and execution.
    You will be responsible for maintaining / patching servers supporting SaaS products. This includes Windows Servers, Linux Servers running in in-house Datacenters and/or using cloud PaaS providers (Primarily GCP & Azure).

    You'll work hand-in-hand with all teams to ship our code to production using Continuous Integration / Continuous Deployment (CI/CD) and AppSec tooling.

    You will collaborate with development teams and use intuition, experience and understanding to create SLIs, SLOs, and SLAs.
    You will provide timely assistance and remediation solutions during critical situations and production incidents to help resolve service problems. (You will be on-call for periods of time.)
    You will develop monitoring architecture, implementing monitoring agents, build dashboards, manage escalations and alerts
    You will participate in incident management and driving root cause analysis (RCA) and risk management processes.

    YOU

    HAVE:
    BS degree in Computer Science or related technical field or 5 years prior relevant experience

    Experience in a DevOps / SRE role with demonstrable experience in deploying and managing large scale production environments in GCP, AWS or Azure and Multi Datacenter environment.

    Experience developing and debugging code (i.e.

    one or more of the following:
    Java, C, C++, .NET, Python, Ruby, Go, Shell, Perl, JavaScript)
    3+ years deploying and supporting high traffic, scalable web

    applications/services
    Experience with Linux, Shell Scripting, PKI TLS/SSL, Network, firewalls, load balancers and backup
    Experience with one or more CI tools GitHub, Jenkins, Artifactory
    Experience with orchestration, automation, and configuration management tools like Terraform
    Ansible and Helm (or related technology)
    Experience with log management, including aggregation, alerting, and graphing (i.e

    Sensu/StackDriver/Prometheus/ELK/TICK

    stacks)
    Excellent analysis, debugging, root-cause identification, and troubleshooting skills

    YOUMIGHT ALSO


    HAVE:
    3+ years with cloud virtualization and PaaS
    3+ years with Docker, Kubernetes and early versions of OpenShift
    Experience with Kubernetes, system virtualization, on-prem and/or hybrid cloud computing, cloud Identity and security system, cloud monitoring and logging, and/or local/cloud storage
    Experience with application disaster recovery, migration, roll-back plans, expansion, routine deployments, and system upgrades
    Experience hosting and solving problems with public-facing services securely in GCP, Azure or AWS
    Experience in designing, analyzing and running large-scale distributed systems
    Experience with Cassandra, Elasticsearch or Kafka
    Experience with CI/AppSec tools – Sonar, Coverity, WhiteSource, Seeker, Aqua
    Cloud certifications
    Offers of employment are conditional upon passage of screening criteria applicable to the job
    EEO Statement
    Integrated into our shared values is NCR Voyix's commitment to diversity and equal employment opportunity.

    All qualified applicants will receive consideration for employment without regard to sex, age, race, color, creed, religion, national origin, disability, sexual orientation, gender identity, veteran status, military service, genetic information, or any other characteristic or conduct protected by law.

    NCR Voyix is committed to being a globally inclusive company where all people are treated fairly, recognized for their individuality, promoted based on performance and encouraged to strive to reach their full potential.

    We believe in understanding and respecting differences among all people. Every individual at NCR Voyix has an ongoing responsibility to respect and support a globally diverse environment.
    Statement to Third Party Agencies

    To ALL recruitment agencies:
    NCR Voyix only accepts resumes from agencies on the preferred supplier list. Please do not forward resumes to our applicant tracking system, NCR Voyix employees, or any NCR Voyix facility. NCR Voyix is not responsible for any fees or charges associated with unsolicited resumes

    The Company has an accommodation process in place and provides accommodations for employees with disabilities. If you require a specific accommodation because of a disability or a medical requirement, please contact Human Resources via e-mail at

    AODA..COM

    so that arrangements can be made for the appropriate accommodations to be in place.

    About Us

    Help us run the world's top brands.
    At NCR , you'll have meaningful and relevant work experiences, opportunities to learn and make a real contribution. Every day we build on our 130+ year history

    to transform, connect and run our customers' technology platforms.

    A career here means embracing our innovative culture and shared values , always seeking new adventures and carving your own path.

    #J-18808-Ljbffr