Site Reliability Engineer

About Media.net

Media.net is one of the world’s leading companies in the Contextual Advertising space that provides a wide range of advertising and traffic monetization solutions. Since its founding, Media.net has constantly broken new ground in building innovative contextual advertising solutions. We serve our contextual real-time ads customized for each visitor and each page view across billions of visitors, across 10’s of millions sites/domains on a server infrastructure that runs 1000+ CPUs, terabytes of RAM, and 100’s of petabytes of data.

With 1000+ employees, Media.net has one of the largest teams worldwide building a global contextual advertising business. By market cap, Media.net is one of the Top 5 largest ad tech companies worldwide. By revenue, Media.net is the #2 largest contextual advertising businesses worldwide.

Media.net is acquired by Chinese Consortium for $900 Million USD in 3rd Largest Ad Tech deal ever and this acquisition will fuel Media.Net’s global expansion strategy and provide access to China’s world-class talent and capital markets.

About the team

Site Reliability Engineering team in Media.Net is responsible for managing scaling, performance, monitoring, security, availability of the production environment. The focus is to architect, develop, automate and deploy products and infrastructure based on Linux and Linux application stacks.

Our environment consists of our own BareMetal and private cloud across co-located datacenter facility and the AWS public cloud. Our engineering teams follow DevOps practices and we rely heavily on open source tools like Jenkins, Selenium, Git, Puppet, Docker, Kubernetes, Open stack, Nagios/Icinga, Kafka, Graphite, Hadoop, Graphite, ELK, Vault etc. We use Python and Go majorly in SRE teams.

What is the job like?

  • Engage with product and engineering team to design, build and maintain the system / software for high availability proactively and drive operation best practices
  • Identify and drive opportunities in making resilient systems that help maintain business continuity
  • Proactively perform troubleshooting, RCA and implement permanent resolution of issues across the stacks – hardware, software, database, network and so on
  • Implementation of proactive monitoring, alerting, trend analysis and self-healing systems
  • Develop continuous delivery for multiple platforms in production and staging environments
  • Find areas of existing manual intervention, and replace with automation wherever possible
  • Demonstrate ability to design, implement and manage highly available, scalable and reliable systems
  • Infrastructure and platform security
  • Effectively use and maintain Infrastructure and config management tools like puppet, chef, ansible, terraform to deploy and manage infrastructure
  • Demonstrate technical mentoring and coaching to team members
  • Adaptable to work in a fast-paced environment and alter priorities as per business needs

Who should apply for this role?

  • B.Tech/M.Tech or Equivalent in Computer Science, Information Technology or related field
  • 2-5 years of experience in handling services in a large-scale distributed systems environment
  • Experience with Unix/Linux operating systems internals and administration (e.g. filesystems, inodes, system calls, etc)
  • Deep understanding of network stack (e.g. TCP/IP, routing, network topologies and hardware, SDN, etc)
  • Awareness of, and ability to reason about, modern software & systems architectures, including load-balancing, queueing, caching, distributed systems failure modes generally, microservices, and so on
  • Excellent programming (Python, Go, Ruby or preferred scripting languages) and automation skills
  • Ability to work independently and own problem statements end-to-end
  • Great communication, interpersonal and teamwork skills
  • You have expertise in some of the below tools/skills –
    • Container orchestration technologies like Kubernetes and Mesos
    • Virtualization platforms, either on-prem or cloud-based (We use Openstack and AWS)
    • Understands Infrastructure as a code (we use Puppet, Ansible and Terraform) and containerization tool sets (we use Docker)
    • Data intensive applications and platforms like Kafka, Hadoop, Spark, Zookeeper, Cassandra, PostgreSQL OLAP, Druid
    • Relational databases like MySQL, Oracle, PostgreSQL etc
    • NoSQL databases like Redis, MongoDB, Cassandra, CouchDB etc
    • One or more CI tools like Jenkins, Teamcity
    • Centralized logging systems, metrics, and tooling frameworks such as ELK, Prometheus, and Grafana
    • Web and Application servers like Apache, Nginx, Tomcat
    • Versioning tools such as git

Benefits & Perks

At Media.Net people love their jobs, and not just because we offer the most competitive salaries in the industry. Our excellent benefits include everything from great medical and life insurance to catered meals. Our workspaces are comfortable and fun, complete with bean bag chairs, ping pong tables, and all the snacks you can eat. We have no dress code (tee-shirts are a-ok!). We have flexible work hours and flexible holidays, which means that teams pick their own work hours. Media.Net has its own concierge desk that doubles up as a travel agency.

We are passionate about building the next generation of web products, and we believe that happy employees are the key to achieving this goal. If you like the idea of working in an exciting workspace on cutting-edge internet products that make a truly global impact (and wearing flip-flops to work), then we want to get to know you!