10600 Westminster Blvd
Westminster, CO 80020
Principal Architect w/Hortonworks
As a speaker Oleg presented seminars at dozens of conferences worldwide (i.e.SpringOne, JavaOne, Java Zone, Jazoon, Java2Days, Scala Days, Uberconf, and others).
This talk will explore the area of real-time data ingest into Hadoop and present the architectural trade-offs as well as demonstrate alternative implementations that strike the appropriate balance across the following common challenges:
- Decentralized writes (multiple data centers and collectors)
- Continuous Availability, High Reliability
- No loss of data
- Elasticity of introducing more writers
- Bursts in Speed per syslog emitter
- Continuous, real-time collection
- Flexible Write Targets (local FS, HDFS etc.)
Today, application developers devote roughly 80% of their code to persisting roughly 20% of the total data flowing through the applications. That means two things:
- 80% of the data flowing through our applications is at best lost in rolling log files, at worst never collected -- without ever being analyzed or accounted for.
- Application-level database programming, licensing, storage, administration, and ETL processing have maxed out IT budgets and have constrained app development teams from keeping pace with the rate of change in the business. The other 80% of the data is "Event Data" that can no longer be ignored if you want to stay competitive. Changes to application state are already stored as a sequence of events in application and middleware logs. In fact, since this data never held value to anyone but the developer in the past, a lot of potentially valuable information is often never collected. With Hadoop, we can: * store and query these events - Transaction tracing,
- use the event log to reconstruct the application domain at any point in time - ETL,
- use the same event log to construct new domains we haven't planned for - ELT, and
- automatically adjust our data domains to cope with retroactive changes - ???
In this talk, we will demonstrate how capturing all event data could dramatically simplify data collection and management within the enterprise.