IND (New) Support / Devops Engineer

Hyderabad, Telangana, India | Engineering | Full-time | COVID-19 remote

Apply

Founded in 2002, Quantium combines the best of human and artificial intelligence to power possibilities for individuals, organisations and society. Our solutions make sense of what has happened and what will, could or should be done to re-shape industries and societies around the needs of the people they serve. 

As one of the world’s fully diversified data science and AI leaders we operate across every sector of the economy and we’re growing fast - with growth comes opportunity! We’re passionate about building out our team of smart, fun, diverse and motivated people. 

We combine a team of experts that spans data scientists, actuaries, statisticians, business analysts, strategy consultants, engineers, technologists, programmers, product developers, and futurists – all dedicated to harnessing the power of data to drive transformational outcomes for our clients. 

We actively foster a culture where our people can stretch themselves to reach their full potential. We also know that work has to work for you, and modern life is fast-paced and balance can be tricky. You want to work where you are respected and valued as an individual, not a number. Quantium embraces a flexible and supportive environment dedicated to powering possibilities for our team members, clients and partners. 

Role summary

As a Support Engineer you will be responsible for the product support of a set of Quantium Products that will include below scope of services:

Client Provisioning

  • Managing the process of client provisioning; working with platform and infrastructure teams to provision new client infrastructure and associated storage/service accounts
  • Set up various client-specific environments to support our standard product release processes
  • Building data pipeline packages, including application and Spark specific configuration and data assets
  • This includes developing client-specific data transfer and ETL processes
  • Installation of any real-time systems using the components provided by the core product team

BAU Operations

  • Performing maintenance and upgrades to underlying operating systems (or docker images)
  • Execution and monitoring of client data pipelines on their scheduled cadence (typically monthly, but can be daily if required) to ensure the delivery of data assests
  • Monitoring of any real-time services to ensure performance and availability continue to meet the agreed SLAs
  • Accurately assessing the impact of issues against severity definitions in order to establish the urgency of a response
  • Investigating issues and escalating to the appropriate staff engineering teams in cases where they can't be resolved directly
  • Escalation may be done directly by contacting staff engineering teams or logging JIRA tickets including adequate information about issues (depending on severity)
  • Responding to triaged support tickets from customers
  • In some cases pro-actively contacting customers to communicate on-going outages and provide estimates around time to resolution
  • Maintenance of internal and external status pages detailing incidents in production environments
  • Logging of detailed shift summaries and incident reports in OpsGenie (or PagerDuty, etc)

Continuous Delivery and Improvement

  • Managing upgrades to the client-specific packages to bring in newly developed core features and continue to improve the performance and reliability of the system; which involves:
  • Liaising with the core product team to understand any upgrade procedures required
  • Working with the operations analysts to sign-off the impact of these changes from both technical and analytics perspectives
  • Performing deployments via CI/CD systems to bring production, staging, and UAT systems in line with stable software releases
  • Analysing systems and adding metrics in order to improve system instrumentation and operational observability
  • Building dashboards, monitors, and test automation to ensure teams have visibility of operational state of systems in (near) real-time
  • Designing and building tools to automate repetitive support tasks

Key responsibilities

  • Troubleshooting & Problem Solving: Product support engineers will work on requests filed by end users of a company’s product or system. Their primary responsibility is troubleshooting, solving problems and resolving errors. Throughout their work they must constantly log details for later reports and to provide customers with updates.
  • Root cause analysis: When technical issues with the product arise, production support engineers must act quickly to analyse the available data and find the root cause of the problem.
  • Update Users: They may work on solution themselves or pass the problem on to other engineering team members, all the while providing users with progress updates.
  • Customer Service: This role entails interacting with product users, often external customers but sometimes also employees. These interactions can occur in various setups, including in-person meetings, phone calls, emails, and live messaging chats. In all of these cases, it’s vital to address concerns promptly and maintain a helpful attitude.
  • Build Knowledgebase & Documentation: Production support engineers prepare extensive documentation when logging product issues, as they must note all details, including their observations, diagnoses, and action steps. Other common tasks include weekly reports summarizing production performance, release notes for upgrades, and troubleshooting guides.
  • Product & process improvements: Because production support engineers deal with product issues firsthand, they can readily suggest overall product improvements, such as features that customers want. Ideally, they should also proactively evaluate engineering processes and provide recommendations to increase efficiency and improve processes

 

 

Experience and education required

  • E / M.E in Computer Science, Information Technology, Electronics and Communications, or equivalent with 2+ years of industry experience
  • Proficiency in (at least) one scripting language, such as Ruby or Python
  • Proficiency in shell scripting and using GNU tools under Linux
  • Exposure to cloud computing technologies, especially networking and system architecture, would be beneficial
  • Exposure to Infrastructure/Configuration as Code and CI/CD technologies would also be beneficial
  • Exposure to Docker, Kubernetes, Ansible, Terraform will be good to have
  • Problem-solving skills –Excel at resolving problems encountered by users
  • Have a deep understanding of the product they handle as well as the processes behind it
  • Attention to detail is very important trait for successful support resources
  • Excellent communications skills and be able to liaise with both customers and internal stakeholders to explain issues and provide updates
  • Ability to understand and articulate technical concepts in clear terms so that documentation around incidents is concise and unambiguous
  • Be strong generalists with a demonstrated track record when it comes to solving complex problems
  • Be curious about why things aren't working and willing to take the time to think critically during the investigation or issues
  • Be capable of building tooling to automate repeatable tasks or contribute back to tool codebases when gaps are

What does success look like?

  • Tickets delivered within SLAs, minimal reactivations and high user satisfaction
  • Has established as an SME on Product support & BAU processes, infrastructure insights and technical knowledge
  • Established automated processes wherever feasible to make support process highly efficient
  • Availability SLAs for services, BAU processes and systems are met by establishing proper monitoring of systems and service both from availability and performance side
  • Well defined dashboards and reports to clearly publish the support KPIs and performance metrics
  • Strong process documentation and knowledgebase created