Chris Wensel is the founder of Concurrent, Inc., and the author of the Cascading data processing open-source project, an alternative API to MapReduce for Apache Hadoop.
He also co-founded Scale Unlimited, the first Hadoop and “Big Data” related professional services and training company, where he mentored and trained companies like Sun Microsystems, Apple, and numerous startups in the Bay Area.
Chris bootstrapped his first Internet startup in the early 90's, creating an early Web server-side scripting language used in the real estate and insurance verticals. During the late 90's, Chris focused on distributed-agent based systems where he received several patents on
distributed computing. From there he became Chief Architect for the fastest growing business unit at Thomson Reuters. Just prior to Concurrent, Chris was a Consulting Architect to TeleAtlas geo-content management group in Belgium.
Chris also advises several startups in the “Big Data” and “Big Audience” technology space.
This session will quickly introduce the Cascading open-source project and how it was used in various projects to overcome problems and bottlenecks particular to large data analytics.
This presentation will cover five different high profile Hadoop and Cascading projects and the lessons learned from them. Then identify the common architectural components across them. We will then present a summary of Hadoop and its architecture to show why Hadoop was a key technology for these projects and the design decisions architects should consider when beginning a new Hadoop project.
Hadoop architecture with discussion on how the MapReduce model influenced it.
In this presentation we will go deep discussing the Hadoop architectural components and some of the early design decisions that architects and administrators should consider when deploying a Hadoop cluster. We will also touch on the Amazon Elastic MapReduce architecture and how that influences applications.
This talk will introduce the Hadoop MapReduce model and common patterns and algorithms implemented to solve common problems.
In this presentation we will introduce the MapReduce processing model and many of the common patterns implemented on top of MapReduce to achieve common processing functionality like joins and secondary sorting. Finally we will discuss a few optimizations and their tradeoffs developers can utilize when creating raw MapReduce applications.