Apache Kafka

Never touched this thing before, but it seems cool.

Apache Kafka is a high-throughput distributed messaging system.

Resources

Kafka is used everywhere

  • LinkedIn: activity streams, operational metrics, data bus
    • 400 nodes, 18k topics, 220B msg/day (peak 3.2M msg/s), May 2014
  • Netflix: real-time monitoring and event processing
  • Twitter: as part of their Storm real-time data pipelines
  • Spotify: log delivery (from 4h down to 10s), Hadoop
  • Loggly: log collection and processing
  • Mozilla: telemetry data