Production System Engineer - San Jose, United States - TikTok

    Default job background
    Description
    Responsibilities TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo. Why Join UsCreation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible.

    Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day.

    To us, every challenge, no matter how difficult, is an opportunity; to learn, to innovate, and to grow as one team.

    Status quo? Never. Courage? Always. At TikTok, we create together and grow together. That's how we drive impact - for ourselves, our company, and the communities we serve. Join us.

    Unlocking the secrets of ByteDance's global tech empire, the Data Systems Infrastructure (DSI) team stands as the unseen architects behind the scenes.

    In a thrilling dance of technology and innovation, we propel the company's meteoric rise by constructing and orchestrating colossal data fortresses, taming the life cycle of server fleets, conjuring Cloud solutions, and crafting a symphony of infrastructure services.

    Our mission is to ensure scalability and unwavering reliability, making sure ByteDance's digital footprint leaves an indelible mark on the world.

    Embark on an exciting expedition to explore the rapidly expanding ByteDance domain in the United States, Europe, and Asia.

    Here, the Data Systems Infrastructure (DSI) team is crafting monumental data citadels that encircle the planet, sheltering legions of hundreds of thousands of servers.

    As the maestro of our production systems, you will embark on a captivating odyssey, taming the life cycles of these servers.

    Your adventure will begin with the orchestration of their initial deployment, navigating the intricate terrain of OS installation, summoning services like a digital magician, and maintaining vigilant watch over our inventory.

    But, like any epic tale, there will be times of challenge when you become a troubleshooter extraordinaire, mending and restoring with unwavering dedication.

    Eventually, you'll guide them into the sunset, orchestrating their decommissioning and ensuring their rebirth through recycling, all while contributing to the pulsating rhythm of ByteDance's technological evolution.


    Responsibilities:

    • - Operation: As a Production Systems Engineer, your mission is to contribute to enhancing the quality, reliability, efficiency, effectiveness, and scalability of our data center and Cloud operations, platform, and service on a worldwide scale
    • Lifecycle Improvement:
    Engage in and improve the whole lifecycle of Infrastructure systems - from system design consulting through to launch reviews, deployment, operation, and refinement

    • Automation: Deliver tools and solutions to improve the automation, reliability, scalability, and operability of services
    • Monitoring: Deliver tools and solutions to improve monitor availability, latency, and overall service, server and Cloud infrastructure and network health
    • Disaster Recovery:
    Troubleshoot and resolve complex technical issues in a high-pressure, time-sensitive environment. Conduct high-level root-cause analysis for service interruption and establish preventive measures. Practice sustainable incident response and postmortem.
    • Cross team Collaboration: Partner with stakeholders like infrastructure architects, project managers, data center operations engineers, platform developers, supply chain teams, and our internal customers to understand overarching business objectives.
    You will also have the opportunity to design and implement innovative solutions for our Core IDCs and CDN/Edge and Cloud Services.- Technical Documentation: Create and maintain standard operating procedures and knowledge bases

    • Oncall:
    Participate in our on-call across continents and incident response teams to solve critical problems in production

    Qualifications Minimum Qualifications:
    1.

    Education:
    Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience

    2 Experience:
    Minimal 3 years of experience in systems infrastructure operations or related fields, working with data center or CDN production systems and system design/validation

    3 Server Hardware:
    We seek individuals with more than just a basic understanding.

    You should be at an intermediate level or higher, where your hands-on experience in labs or data centers has forged a deep connection with server architecture.4.


    Data Center:
    An intermediate level of expertise is preferred here.

    We're on the lookout for those who are well-versed in the intricate details of operations, from small things like OS installations and break-fix to high-impact projects like planning and operations (covering the full infrastructure lifecycle) to the new design-build facilities or renovations to existing systems.5.


    Monitoring:
    Your knowledge should transcend the ordinary; we prefer intermediate-level skills.

    We expect you to be a maestro in the orchestration of tools and designs for monitoring server health, network switches, and the power and temperature conditions of the data center.6.


    Automation:
    We welcome those who have delved into the realm of automation, ideally at an intermediate level. Your qualifications should reflect at least one automation project, showcasing your commitment to streamlining processes.7.


    Linux:
    In the realm of Linux, we are in search of individuals with intermediate-level proficiency. Your mastery of this operating system should shine brightly.8.


    Coding:
    As you navigate the digital landscape, fluency in Bash, Python, and Golang is strongly favored. Your coding skills will be your trusty companions on this adventure.9.


    Network:
    When it comes to networks, we're seeking at least a junior-level understanding. Your ability to chart the course through the network labyrinth is essential.10.


    Communication:
    Experience managing and coordinating teams in the global environment


    11 Project Management:
    Experience in the preparation of project plans and specifications, drafting scopes of work, and managing multiple projects simultaneously. 12. Experience in Agile methodologies (e.g., Kanban, Scrum) with experience in user stories, sprint planning, and backlog management.13.


    Preferred But Not Required Skills:
    Golang, REST APIs, Gin, Ansible, Load Balancer, SQL, Hive, Hadoop, Clickhouse, Message Queue, Redis.
    TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy.

    To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach.

    We are passionate about this and hope you are too.

    TikTok is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws.

    If you need assistance or a reasonable accommodation, please reach out to us at Information[For Pay Transparency] Compensation Description (annually) The base salary range for this position in the selected city is $ $205000 annually.

    Compensation may vary outside of this range depending on a number of factors, including a candidate's qualifications, skills, competencies and experience, and location.

    Base pay is one part of the Total Package that is provided to compensate and recognize employees for their work, and this role may be eligible for additional discretionary bonuses/incentives, and restricted stock units.

    Our company benefits are designed to convey company culture and values, to create an efficient and inspiring work environment, and to support our employees to give their best in both work and life.


    We offer the following benefits to eligible employees:

    We cover 100% premium coverage for employee medical insurance, approximately 75% premium coverage for dependents and offer a Health Savings Account(HSA) with a company match.

    As well as Dental, Vision, Short/Long term Disability, Basic Life, Voluntary Life and AD&D insurance plans. In addition to Flexible Spending Account(FSA) Options like Health Care, Limited Purpose and Dependent Care.


    Our time off and leave plans are: 10 paid holidays per year plus 17 days of Paid Personal Time Off (PPTO) (prorated upon hire and increased by tenure) and 10 paid sick days per year as well as 12 weeks of paid Parental leave and 8 weeks of paid Supplemental Disability.

    We also provide generous benefits like mental and emotional health benefits through our EAP and Lyra. A 401K company match, gym and cellphone service reimbursements. The Company reserves the right to modify or change these benefits programs at any time, with or without notice.