Docker Amazon Web Services (AWS) Kubernetes teamleader agile-project-management Java Python Agile TSQL Azure NoSQL Linux Cloud Coumputing Ansible Jira SaaS AWS PowerShell DevOps PostgreSQL ITIL Kafka

About this job

Job type: Full-time
Experience level: Lead, Manager
Role: DevOps, Product Manager, System Administrator
Industry: Aviation, Information Technology, Software Development
Company size: 5k-10k people
Company type: Private


docker, kubernetes, amazon-web-services, teamleader, agile-project-management

Job description


The Manager, SRE & DevOps leads all aspects of SRE & DevOps with outstanding technical & data expertise to deliver the best and world-class user experience on all services that we provide. As a Manager, SRE & DevOps, you will lead a team to build and run large-scale, fault-tolerant systems and services. Cultural fit is a must, as you will need to be self-motivated, a critical problem solver, data-driven, results-oriented, with a focus on delivering outstanding user experience.

You will manage the SRE-centric efforts across independent functional teams comprised of Architecture, Engineering, Security and Solution Architecture and will lead a strong and experienced team to negotiate requirements with demanding internal and external clients and pushing us toward project milestones, driving daily agile-like stand-ups to promote team communication and keep the team motivated.

Key Responsibilities

  • Define Critical Success Factors and Key Performance Indicators (KPI) for processes and drive the reporting associated with them consistently across the organization to identify trends, anticipate problems to ensure a best in class level of support and service.
  • Accountable for process definition, promotion, and governance of the processes as well as driving implementation, adoption, and continuous improvement across the organization to enable business process improvement, innovation and create a service culture
  • Prioritize and maintain the backlog based on the business needs to meet tight deadlines and ensure agile practices are performed in planning of weekly sprints and communicate on behalf of the team to report on progress, risk and achievements
  • Identify inter-dependencies between the various partner groups to ensure all are aligned and risks are identified, mitigated and communicated.
  • Build a knowledge base with lessons learned from incidents and support issues to support
  • Work with other team to encourage DevOps practices (deployment, monitoring, observability, Scalability)
  • Build software and systems to manage infrastructure and applications through automation Deployment, support and monitoring of existing and new services, platforms, and application stacks and Increase operational efficiency via automation and reducing manual tasks
  • Establish Service Level Agreements and Operational Level Agreements; and monitor, improve, and report performance on these and other key performance indicators

Knowledge & Skills

  • Proven leader who combines technical expertise with well-developed business acumen, strong analytic and problem-solving skills leading to effective decision making that enables process improvements
  • Excellent problem solving, critical thinking, and interpersonal skills - Lead by example to empower and challenge the team to deliver their best.
  • Excellent communication skills for working across the organization, capable of building strong relationships with peers and leadership
  • Hands on experience managing large, transformational projects and leading organizational change management initiatives
  • Ability to prioritize and execute tasks in a high-pressure environment and make sound decisions in emergency situations
  • Ability to deliver quantitative metrics of the environment to help with planning and execution of service delivery


  • 4+ years of people management and team leadership experience developing strong and motivated teams with B. Tech./B.E. degree in Electronics & Telecomm or Computer Science.
  • 5+ years of demonstrated ability in site reliability and technical operations leadership
  • Background in leading infrastructure / DevOps / SRE for highly-available, large-scale SaaS platforms and experience with modern SRE & DevOps practices
  • Solid understanding of software development, debugging, optimization, and/or troubleshooting - hands-on experience with common programming languages preferred
  • Experience building large and geographically disperse infrastructure supporting business-critical cloud & on-premises services
  • Experience leading security concerns especially in the context of hosted environments and operations identity management. Leading through certification in HITRUST, ISO or other security certifications is a strong asset.
  • Experience operating and maintaining production systems in a Linux private and public cloud environment: Azure and/or AWS preferred
  • Extensive experience leading teams responsible for customer facing systems in a high uptime 24-7 environment
  • Expertise analyzing sophisticated application, database, network, and OS issues across a distributed large-scale business critical system
  • A depth and breadth of experience with server-side Java development, relational databases, eventually consistent, high efficiency, cluster-based NoSQL solutions and distributed streaming platforms.
  • Experience on configuration management, code deployment and automate tasks like setup centralized log collection, monitoring, vulnerabilities patching, security audits
  • Experience in Monitoring tools and Ticketing tools like ServiceNow, Jira, New Relic and Nagios
  • Experience leveraging programming/scripting platforms (Unix shell, PowerShell, Python) to increase operational efficiency & consistency through automation of repeatable tasks, infrastructure as code tools such as Ansible or Terraform
  • Good understanding of APIs, TLS, HTTP & DNS
  • Experience working with PostgreSQL, MS SQL, SQL, Lucene and MongoDB
  • Experience with Kubernetes, Docker, Kafka, Bitbucket and Bamboo
  • Experience with 24/7 site monitoring and ability to own uptime & performance SLA’s
  • Operational experience at scale - designing and operating highly available, scalable, and fault-tolerant systems using best-of-breed technologies like containers, APIs, Data Platform, etc.
  • Strong knowledge of ITIL best practices

Últimas ofertas de Software Development