Nathan Marz, Lead Engineer on Twitter's Publisher Analytics team
Nathan Marz - Lead Engineer on Twitter's Publisher Analytics team
Tue - 10:15-11:15 AM, Ballroom A
Infrastructure

Storm makes it easy to write and scale complex realtime computations on a cluster of computers, doing for realtime processing what Hadoop did for batch processing. Storm guarantees that every message will be processed. And it’s fast – you can process millions of messages per second with a small cluster. Best of all, you can write Storm topologies using any programming language. Storm was open-sourced by Twitter in September of 2011 and has since been adopted by numerous companies around the world.

Storm provides a small set of simple, easy to understand primitives. These primitives can be used to solve a stunning number of realtime computation problems, from stream processing to continuous computation to distributed RPC. In this talk you’ll learn:

  • The concepts of Storm: streams, spouts, bolts, and topologies
  • Developing and testing topologies using Storm’s local mode
  • Deploying topologies on Storm clusters
  • How Storm achieves fault-tolerance and guarantees data processing
  • Computing intense functions on the fly in parallel using Distributed RPC
  • Making realtime computations idempotent using transactional topologies
  • Examples of production usage of Storm

Presentation Video