tag:blogger.com,1999:blog-1706487818062747542024-02-19T02:41:14.281-08:00Big data - tidbits of knowledge<br> Big Data solutions architecture
<p>
This is the space of the result of my learnings during my journey into Big Data, and will encompass the different technologies encountered in that space, learned while working with different customers.
</p>
<p>
I will put my findings (mostly technical) and comments/thoughts to help others, the same way I have found solutions by way of looking at others resources ..
</p>
p.s.: music-related tidbits <a href="http://mattlieber.bandcamp.com/">here</a>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.comBlogger46125tag:blogger.com,1999:blog-170648781806274754.post-30096237476511948142018-10-19T13:44:00.000-07:002018-10-19T13:44:09.186-07:00Hackaton project - the Parking Avoidr!<div dir="ltr" style="text-align: left;" trbidi="on">
Here is a video of my latest project, done over 4 days during a hackaton - let me know your thoughts!<br />
<a href="https://drive.google.com/file/d/1R0rkQXdisPBqBZehoSxHOFkz10AM2EIg/view?usp=sharing">https://drive.google.com/file/d/1R0rkQXdisPBqBZehoSxHOFkz10AM2EIg/view?usp=sharing</a><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO81viuSDbSk71jfT5haqET-90O7gHXyK5zXPWKpJLfS3hRbZfl2Tu4LNSHPLRA_mXdfPaHBc_3pI8rUmT6CN0Fc0b7L9Cafzgf01Jd-TMb6kKlO64LpBSoLHpMvtelshYWIGbTXDQqc8b/s1600/IMG_5027.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1600" data-original-width="1200" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO81viuSDbSk71jfT5haqET-90O7gHXyK5zXPWKpJLfS3hRbZfl2Tu4LNSHPLRA_mXdfPaHBc_3pI8rUmT6CN0Fc0b7L9Cafzgf01Jd-TMb6kKlO64LpBSoLHpMvtelshYWIGbTXDQqc8b/s320/IMG_5027.jpg" width="240" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi06ZpgawQlPU7RQBqUZwDG2Zw3ze0Jj25bu20lQ1pcL4MzXWBzIh34-XWvj0Du2xdIK9TcEvfdTsHYGe_8lkut1XpnvO1dMVzgvqIgbu2ACj1wieh_PlK5MoAAya85hEz8OOX_6a5biRrV/s1600/Screen+Shot+2018-10-18+at+3.20.40+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="599" data-original-width="792" height="242" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi06ZpgawQlPU7RQBqUZwDG2Zw3ze0Jj25bu20lQ1pcL4MzXWBzIh34-XWvj0Du2xdIK9TcEvfdTsHYGe_8lkut1XpnvO1dMVzgvqIgbu2ACj1wieh_PlK5MoAAya85hEz8OOX_6a5biRrV/s320/Screen+Shot+2018-10-18+at+3.20.40+PM.png" width="320" /></a></div>
<br /></div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com0tag:blogger.com,1999:blog-170648781806274754.post-75411172703843231292018-05-24T17:11:00.001-07:002018-05-30T14:20:05.988-07:00Overview of the Scrum Master role<div dir="ltr" style="text-align: left;" trbidi="on">
These notes are from a training i recently took regarding being a <b>Scrum Master</b>. The role, processes, and definitions are already clearly outlined and documented in a lot of places so I won't go over these; rather, I want to complement these standard resources with the below notes I took from direct people's experiences and peppered with examples during my company training, that hopefully will be helpful and serve as a complement to the official Scrum master training and slides.<br />
<br />
<h4 style="text-align: left;">
Role of a Scrum master</h4>
The Scrum master transforms individuals of a team into high performing, value-delivering teams, guiding teams to achievable plans. How? By removing impediments, championing quality, and detecting questionable commitments.<br />
The goal of the team is to reach a <b>transparent team velocity</b> to enable predictability.<br />
In practice, the Scrum master usually serves as the <b>central point of contact</b> within the team; for example some engineers reported they were usually too shy to talk to a different team regarding technical dependencies, and thus the Scrum master will usually take that on, for example.<br />
However the Scrum master will not be a people manager, but only serve the team. He/She is also an active listener, and sometimes needs to convince team members on the path forward, without necessarily just push the Scrum process.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9Ycp4B28XVzZETEI-mex41Tc1bfuM7EIIS7jejYr7bSx9pB39iPKxI8v9TMBiwaGRczIx_-KJEd3jhgXcbVxRk0B8ds2imgq4UXnjfw3RxwUUbqr0Z3NgYP8sQFwA-0gO_j5EIbNCAOxa/s1600/Screen+Shot+2018-05-24+at+5.05.14+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="249" data-original-width="366" height="217" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9Ycp4B28XVzZETEI-mex41Tc1bfuM7EIIS7jejYr7bSx9pB39iPKxI8v9TMBiwaGRczIx_-KJEd3jhgXcbVxRk0B8ds2imgq4UXnjfw3RxwUUbqr0Z3NgYP8sQFwA-0gO_j5EIbNCAOxa/s320/Screen+Shot+2018-05-24+at+5.05.14+PM.png" width="320" /></a></div>
<br />
<h4 style="text-align: left;">
Communication</h4>
<div style="text-align: left;">
How to perform the above? By way of a good communication. The Scrum master, as well as the other people on the team, should utilize<b> focused and active listening</b> with other members of the team, while keeping an awareness of the whole environment. This stems from the fact that people usually tend to only focus on <i>how to</i> answer while a person is speaking, rather than being really <i>attentive</i>.</div>
How does this work when a person is working remotely, like it's often the case? It should be mandated that they turn their video sharing camera on.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSkyRfcHDyBpruYaxdqCRGg5QW7D2GwAx__cBp20INBXhbsCwb64lODc2qECnOajA9AeaELIxO5ATRX_FOMhqY0Isk-WQpINt_1Xv_OJT_KStw-Lexq7KXhS_YR8wWYrGg6B3_x9QqMYkh/s1600/Screen+Shot+2018-05-24+at+5.04.07+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="365" data-original-width="496" height="235" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSkyRfcHDyBpruYaxdqCRGg5QW7D2GwAx__cBp20INBXhbsCwb64lODc2qECnOajA9AeaELIxO5ATRX_FOMhqY0Isk-WQpINt_1Xv_OJT_KStw-Lexq7KXhS_YR8wWYrGg6B3_x9QqMYkh/s320/Screen+Shot+2018-05-24+at+5.04.07+PM.png" width="320" /></a></div>
<br />
<h4 style="text-align: left;">
Solving a problem</h4>
The Scrum master should use <b>open-ended questions</b> when trying to look for a solution. "<i>Why</i>" questions should be avoided, as they place people in the past, on the defensive. Rather, the solution should be be looked upon the forward path. As such, powerful questioning should be used to send people in the direction of discovery. I have seen this line of questioning emphasized in other communication trainings as well as a way to defuse conflicts; i.e in managers-employees meetings, the manager/leader should support and help the subordinate by using powerful questions to explore and guide possibilities and resolutions of a problem, rather than use a more authoritative approach.<br />
<div style="text-align: left;">
To this effect, there is a clear delineation between a <b>mentor and a coach</b>: A mentor is an area expert in his field; a coach, in turn, will help drive answers.<br />
<br /></div>
<div style="text-align: left;">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjDse2IaqocJDc-S118iPIqYJITbTopQzlOuNWiXrjgc6RnUX2PYn0L4RTPIJueNH3xt7myoc76MAHguD3qZwiQxoAmcWVw53Wvasb76SrOv8Sbn3MS5wQ34JREGkv5oKbw156lpMywnwjR/s1600/Screen+Shot+2018-05-24+at+5.01.36+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="126" data-original-width="238" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjDse2IaqocJDc-S118iPIqYJITbTopQzlOuNWiXrjgc6RnUX2PYn0L4RTPIJueNH3xt7myoc76MAHguD3qZwiQxoAmcWVw53Wvasb76SrOv8Sbn3MS5wQ34JREGkv5oKbw156lpMywnwjR/s1600/Screen+Shot+2018-05-24+at+5.01.36+PM.png" /></a></div>
<br /></div>
<h4 style="text-align: left;">
Role delineation</h4>
<div style="text-align: left;">
The Scrum master (SM) is most of the time an Engineering manager in practice. The Product Owner (PO) is usually a Product Manager or Technical program manager, and the team is often comprised of 3-4 engineers in an optimal situation. Sometimes, roles are <b>combined</b>, but that leads to dysfunctional teams most of the time! The PO's drive is to get as many features as possible in the product as she owns the backlog of prioritized stories, bugs, and technical debt, whereas the SM is capacity-aware and is focused on project and time management of the stories.</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
In big companies, the roles delineation is usually compounded by the fact that team members are shared across different teams: i.e. UX, QA, Security teams are working across multiple ongoing projects across the organisation. Some of these resources are either directly embedded 100% of the time in a project/team for some amount of time, or just act as coaches to the team part of the time. </div>
<div style="text-align: left;">
<br /></div>
<h4 style="text-align: left;">
Meetings/Ceremonies</h4>
<div style="text-align: left;">
There is a common complaint (including from me) that Scrum takes too much time in terms of meeting time. These meetings and ceremonies are, to recap:<br />
<br /></div>
<div style="text-align: left;">
-<b><u>Sprint planning</u></b>: to commit to the fixed scope of work. This is where the team <b>negotiates</b> the work to be done in the upcoming sprint, according to its capacity. Some teams have problems with this, and end up carrying over previous tasks and stories from last sprints, as they don't understand their team capacity (sometimes due to constant changes in the team).</div>
<div style="text-align: left;">
It is really interesting to measure and compare what was initially agreed upon and committed at the beginning of a Sprint versus what was delivered at the end; to this effect, our tools take a snapshot of the initial commitment.<br />
This meeting should be taking the total of 1 hour.<br />
<br /></div>
<div style="text-align: left;">
-<b><u>Sprint standup</u></b>: run daily, <span style="background-color: white; font-family: "times" , "times new roman" , serif;">helps set the context for the coming day's work, with the standard 3 questions. </span>The general rule of thumb is to try to be <b>effective</b> in running the meeting by not chit-chatting, moving bigger conversations to the "parking lot", and getting everybody's voice heard. Even though my experience is that these meetings are usually long, it can be effectively done in 15 minutes. Some teams sometimes skip this on 'no meeting week day', or do this over Slack, which is fine.<br />
Some finer points about enabling the conversation: this should be among team members, not in a 1-1 Scrum master-team member engagement fashion. Unfortunately there is a tendency to do the latter when the Engineering manager acts as the Scrum master, and the meeting becomes a status report meeting.<br />
<br />
<div style="text-align: left;">
-<b><u>Sprint review</u></b>: <span style="font-size: medium;">at the end of the sprint, the team has <span style="background-color: white;"><span style="font-family: "times" , "times new roman" , serif;">delivered a potentially shippable product increment. </span></span><span style="background-color: white; font-family: "times" , "times new roman" , serif;">During this meeting, the Scrum team shows what they <b>accomplished</b> during the sprint. The demo is not and should not be a marketing demo, it can be Typically this takes the form of a demo of the new features, usually to the key stakeholders.</span></span></div>
<span style="background-color: white; font-family: "times" , "times new roman" , serif;"><br /></span>
-<b><u>Backlog grooming and refinement meeting</u></b>: this meeting is to<b> look ahead</b> to the next sprint and plan accordingly. Our coach said that not doing this meeting is hazardous. This should happen roughly in the middle of the ongoing sprint, to ensure the next sprint planning is on track. Every User story that is older than 6 months should be gotten rid of.<br />
<br />
- <b><u>Retrospective</u></b>: this should not be a place where people <b>complain</b>! Instead, constructive criticism should be used to further improve the process in an incremental way, without the team's control. This meeting should generate insights. An implementation of this is to take Slack polls on the team, in the interest of time. Retro time should also make use of the <b>Root Cause Analysis</b> process to understand how something went wrong and its remedy. An example of this is: This public monument is very dirty, how to take care of this situation? Why is it dirty? Pigeon poop caused it, why ? -> they come at night when no one is there, why ? No one is there; remedy: change the light schedule on the monument, which was not an obvious change to make in order to fix the situation, and is an example of not jumping to a solution immediately.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhilUpC0qKllekdESv9I7Ahu6Jy8W1bDPatUoyzknf-FqHF7K8tEUjgmIfTjB3G0Z1Hq9dalT2aOKXarEebMlgHhP_l192MGpSFRB6h_8XB3Qg-8ICPL2TfbPm3uDswO3Zj6ZFPj_3HVCJx/s1600/Screen+Shot+2018-05-24+at+5.00.34+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="319" data-original-width="243" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhilUpC0qKllekdESv9I7Ahu6Jy8W1bDPatUoyzknf-FqHF7K8tEUjgmIfTjB3G0Z1Hq9dalT2aOKXarEebMlgHhP_l192MGpSFRB6h_8XB3Qg-8ICPL2TfbPm3uDswO3Zj6ZFPj_3HVCJx/s1600/Screen+Shot+2018-05-24+at+5.00.34+PM.png" /></a></div>
<br />
<br />
Anyway, to conclude on retrospectives: Scrum is centered about a feedback communication loop, and this is such a meeting: to refine the process. </div>
<div style="text-align: left;">
<br />
To close, a note about being <b>meeting-heavy</b>: our coach emphasized that when scrum is done well, 80% of a team member's time should be hands-on coding, and the rest of the time only in meetings. Again, Scrum emphasizes communication over process.<br />
<br />
<h4 style="text-align: left;">
User stories</h4>
These are the basic <b>units of scope</b> for a Scrum project. It should describe the business value, and talk about Who/What/Why. The template of a User story should be: "As a <persona/type of user>, I want to <goal> so that <business value>." Of note, it is important that the organization has a fixed set list of the personas available to be consistent across the company's product offerings.<br />
<br />
The definition of a <b>Ready status and Done status</b> of a User story (and its dependent items, i.e. task, sprint, etc) depends on the team; but they are usually some common threads about being clear, reviewed, testable, etc. The Acceptance criteria's will be used to determine if the story is fully implemented.<br />
The points <b>estimation</b> given to evaluate the story's complexity should not be precise, but rather to give a rough idea about what the work will entail. Best practices underline stories being compared to one another and in relationship with past work in order to be pointed and measured effectively as baselines, rather than as stand alone. <b>Playing poker</b>, i.e. everyone on the team giving an estimate at the same time, ensures no anchoring onto the HiPPO on the team, which becomes a non democratic decision..<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjXbsc0EWUGt_fZitG3D-_6YFRfIH7uytZ10CSZcsmxusEz2GJd8go827NhRooDEzSGkI0N23P6mru8d1s6Mew3azirReP8Vs0ZNqVhBirQJAA9q4CT1wxTwFrns1cZolStThSRZHUt4P8a/s1600/Screen+Shot+2018-05-24+at+5.02.33+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="128" data-original-width="192" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjXbsc0EWUGt_fZitG3D-_6YFRfIH7uytZ10CSZcsmxusEz2GJd8go827NhRooDEzSGkI0N23P6mru8d1s6Mew3azirReP8Vs0ZNqVhBirQJAA9q4CT1wxTwFrns1cZolStThSRZHUt4P8a/s1600/Screen+Shot+2018-05-24+at+5.02.33+PM.png" /></a></div>
<br />
A story should not be given necessarily to the <b>domain expert</b> in residence. As good practice, Agile underlines knowledge spreading across all team members; hence the estimation should be pointed for any generic engineer taking on the task. A corollary to this is that pointing should be performed in terms of size, not time, as the work could be done by different people and precisely timing the task will generally be wrong in the first place; allowing the sizing to be coarser (T-shirt size, Fibonacci scale) will make the process actually smoother.<br />
"Mind the product" by D. Pink is a good reference book on Product management.<br />
<h4 style="text-align: left;">
Spikes</h4>
Spikes are special unit of work cases where <b>"we don't know what we don't know"</b>. They are scheduled within the Sprint as discovery tasks, that usually take 1-2 days. The outcome is not necessarily a POC, but more of a quick learning output, which should leads into the creation of a new User story in the next sprint.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOwrzuogazFV-2xg8s7oBy1mLNNDz1r2LVnOKt_PIAtRDNimKuTZc_mQmFBHU2LPiveaKLR-63co4lGTFfx9qNqsClLyDViUTKMCEQldWR_vHvLRhO8gXRExBICCJPR6BmV8TgbhLOnf0O/s1600/Screen+Shot+2018-05-24+at+5.03.11+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="253" data-original-width="58" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOwrzuogazFV-2xg8s7oBy1mLNNDz1r2LVnOKt_PIAtRDNimKuTZc_mQmFBHU2LPiveaKLR-63co4lGTFfx9qNqsClLyDViUTKMCEQldWR_vHvLRhO8gXRExBICCJPR6BmV8TgbhLOnf0O/s1600/Screen+Shot+2018-05-24+at+5.03.11+PM.png" /></a></div>
<br />
<h4 style="text-align: left;">
Releases</h4>
A lot of teams in our organisation draw <b>confidence lines</b> (optimistic|pessimistic) on the sprint release, and speak with probabilities, and let the stakeholders make the decisions on the release status.<br />
<h4 style="text-align: left;">
Conclusion</h4>
The Scrum process is important as it allows to collectively and collaboratively move towards a common goal. The ultimate goal is continuous delivery, with every checkin being potentially shippable. So, rather than have a hard model of say 3 releases a year across the organisation where everyone has to align, move to a more flexible model of adaptive planning that marries well with the current architectures of today, such as micro-services.</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com0tag:blogger.com,1999:blog-170648781806274754.post-86118343699474888972018-05-16T12:08:00.002-07:002018-05-21T16:46:47.538-07:00Quick overview of Kubernetes<div dir="ltr" style="text-align: left;" trbidi="on">
<h4>
<div style="text-align: left;">
<br />
<span style="font-weight: normal;">These are the notes that i took while learning more about Kubernetes.</span></div>
<div style="text-align: left;">
<br /></div>
High-level overview</h4>
Most software today runs multiple processes, and is written across distributed systems, and keeping track of all this is a challenge. At a high level, Kubernetes helps runs your software and processes on a cluster of computers, and runs them as one entity. Kubernetes manages the processes and ensures they stay running.<br />
Kubernetes (K8s) is inspired by Google's Borg system, and originated there. Google had already built this infrastructure, and released K8s as an open-source project.<br />
<br />
<h4 style="text-align: left;">
Containers</h4>
K8s runs Docker as the primary container format (among others, less popular ones). The container gives the developer a hermetically sealed container, i.e. a box for our processes. The context for these processes is always the same, which allows the package/container to be run on different machines and always give the same result. K8s' role is to keep track of these processes, ensures they stay up, and helps them find each other.<br />
K8s can be run on different environments: in any major cloud provider (GCP, Azure, AWS), but also on premise or in a hybrid environment, with consistency, as K8s is open source software. So in theory there is no vendor lock-in, and K8s workloads can be moved (gradually or not) from one provider to another.<br />
<h4 style="text-align: left;">
Setup</h4>
The way to set up K8s is done in a declarative way, in a config file (i.e. version of the software to run, # of instances, desired state, etc). Dial or knob for the number of processes can be changed for scaling purposes, and is just a matter of changing the config file.<br />
<br />
<h4 style="text-align: left;">
Schedulers</h4>
Scheduling in a distributed environment is running copies of an instance consistently. Scheduling ensures loading the service on a machine that's not too busy, which as a metaphor equates to playing a multi-dimensional game of Tetris with resources: this create oddly shaped combinations of disk / CPU / disk requirements that are the nodes you run your software on, and that are set up for maximum efficiency.<br />
<br />
Rolling updates can be managed this way: the developer can roll out a new definition of her software for a container, and slowly add the new definition/version of her app, while dialing down the older version to slowly replace the previous version. So atomic upgrades are possible, as well as rollbacks, in a seamless way.<br />
<div>
Interestingly, prior existing data infrastructures like Hadoop that tried to do it all, including managing the infrastructure now <a href="https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/">go through Docker</a> , <span class="Apple-converted-space" style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px;"> </span><span style="caret-color: rgb(0, 0, 0);"><span style="font-family: Times, Times New Roman, serif;">running a dockerized application on Apache Hadoop YARN.</span></span></div>
<h4 style="text-align: left;">
Service discovery</h4>
However K8s is more than a scheduler as it also performs service discovery. K8s intelligently routes to services, which are often tagged (i.e. #backend, #frontend, etc) services that you can target, which is a very powerful concept.<br />
An example of this is a load balancer which is also managed by K8s. Usually static names for the different parts of your system are given, and thus can be handled easily.<br />
<br />
<h4 style="text-align: left;">
Storage</h4>
From its inception, <a href="http://thenewstack.io/tag/Docker/">Docker</a> encouraged the design of stateless services. <br />
Persistence and statefulness are an afterthought in the world of containers. This design works in favor of workload scalability and portability. <br />
It is one of the reasons why containers are fueling cloud-native architectures, microservices, and web-scale deployments.<br />
<div>
So, given that either the host can abruptly terminate, or the container itself can fail, the state needs to be stored usually somewhere else via a networked volume independent of the host or the container.</div>
<div>
<br /></div>
<div>
A pod is the logical unit of Deployment in Kubernetes. </div>
<div>
A K8s volume is attached to the pod that encloses it. Data in the volume is preserved across container restarts. If the pod dies, the volume is gone. The K8s volume is a directory with some data that is accessible to all the containers of a pod.<br />
<br />
<h4>
Google Container Engine</h4>
<br />
GCE is the hosted version of K8s managed by Google. Thus K8s is being upgraded, and the cluster handled for you, for example. What you get out of a hosted environment is:<br />
- Dynamic creation and removal of machines<br />
- APIs for controlling the cluster and the network.<br />
<br />
Autoscaling is a major feature of GCP (and is also offered in other providers).<br />
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com0tag:blogger.com,1999:blog-170648781806274754.post-57893298101337731622016-04-04T22:55:00.001-07:002016-04-05T08:58:45.821-07:00Review of the Strata San Jose 2016 conference in the context of IoT<div dir="ltr" style="text-align: left;" trbidi="on">
<h2 style="text-align: left;">
IoT at Strata San Jose 2016</h2>
<h3 style="text-align: left;">
Introduction</h3>
I had a different focus this year in attending Strata San Jose - not following the usual Hadoop-related news or products, but instead focusing on the Internet of Things (IoT) and by extension, streaming technology. As such, it was pretty clear from the different talks and in talking to attendees that batch-oriented processing is becoming less of a focus and an artifact of history, as <a href="https://www.oreilly.com/people/09f01-tyler-akidau" itemprop="name" style="background-color: white; border-bottom-color: rgb(129, 0, 28); box-sizing: inherit; color: #81001c; cursor: pointer; font-family: Guardian, open-sans, Helvetica, Arial, sans-serif; font-size: 15px; line-height: 21px; outline: 0px; text-decoration: none;">Tyler Akidau</a> says.<br />
I attended as many IoT-related talks that i could, and in capturing my findings below, I separated these in two parts: the business level/overview and key points that need to be architected in the IoT system, and the technical streaming frameworks of choice. I tried to blend the insights from the speakers together, and mention which talk they come from.<br />
<br />
<h3 style="text-align: left;">
The talks</h3>
<div>
I attended first the talks by Dunning regarding <a href="http://conferences.oreilly.com/strata/hadoop-big-data-ca/public/schedule/detail/47291">messaging systems</a>, as well as the <a href="http://conferences.oreilly.com/strata/hadoop-big-data-ca/public/schedule/detail/46818">No Lambda talk</a> , as well as took a look at what <a href="http://conferences.oreilly.com/strata/hadoop-big-data-ca/public/schedule/detail/50442">Pivotal says about IoT</a> . I then attended <a href="http://conferences.oreilly.com/strata/hadoop-big-data-ca/public/schedule/detail/46954">Moty's talk from Intel</a>, the <a href="http://conferences.oreilly.com/strata/hadoop-big-data-ca/public/schedule/detail/47053">talk about robots from Microsoft</a>, then </div>
<div>
On Thursday I attended the second <a href="http://conferences.oreilly.com/strata/hadoop-big-data-ca/public/schedule/detail/50688">Intel talk</a> , the talk <a href="http://conferences.oreilly.com/strata/hadoop-big-data-ca/public/schedule/detail/50433">from Ryft</a>, Capital one's <a href="http://conferences.oreilly.com/strata/hadoop-big-data-ca/public/schedule/detail/47047">talk regarding their architecture</a> which was unfortunately at the same time as Twitter talking about Heron and Google talking about Dataflow .. I then had to attend the talk (like a million other people) about "<a href="http://conferences.oreilly.com/strata/hadoop-big-data-ca/public/schedule/detail/47190">Streaming done right</a>" by Flink !</div>
<div>
<br /></div>
<h3 style="text-align: left;">
The state of IoT</h3>
<div>
Everyone addresses the state of the Internet of Things market first, with the same figure: 50 billion devices projected to be connected in 2020. The usual explanations behind the growth in IoT are known: </div>
<div>
- <b>Moore's law</b>, making the chips smaller with more components every year</div>
<div>
- <b>Price decrease</b> of components</div>
<div>
- <b>Distributed systems</b> that parallelize tasks across multiple machines.<br />
- <b>AI </b>algorithms are entering a golden age.<br />
- Data <b>growth</b>/explosion<br />
<br />
<h4 style="text-align: left;">
The hype curve?</h4>
Is IoT all hype? Well, most of sensors (85%) are unconnected as of now, but 5.5MM devices will get connected daily in 2016..Also, the "data as the new oil" adage seems to continue to take hold: more and more data gets created, and its volume doubles every 2 years. So it seems like IoT is no hype. In addition, the use cases are many.<br />
<br />
<h4 style="text-align: left;">
IoT systems definition</h4>
The uses cases are endless, with having in common: energy savings, security, monitoring, across all verticals.<br />
Intel's definition of IoT is pretty fair: IoT use cases are usually time-series oriented, and are made up of 3 silos:<br />
- The "<b>things</b>": end-points at the edge, like cars, wearables, gateway networks, sensors. This is where data acquisition happens.<br />
- The <b>network</b><br />
- The <b>cloud</b>: where most of the analytics takes place. This is where real-time visualization, analysis, alerts take place.<br />
<div>
<br /></div>
<div>
The actual set up of IoT is in its process management workflow layer, where the rules are actually written. This is usually in the form of a SQL-like descriptive language for the most part.</div>
<div>
The end goal of an IoT system is to create a predictive, and sometimes prescriptive platform.</div>
<h4 style="text-align: left;">
The importance of edge nodes</h4>
Interestingly Intel says that some of the analytics in an IoT architecture needs to happen at the edges (Ryft said the same thing):<span style="background-color: white;"> "<span style="font-family: inherit; font-size: 16px; line-height: 22px; white-space: pre-wrap;">40% of IOT generated data will be stored, processed, analyzed & acted upon at the edge."
</span></span><br />
The new velocity of data needs to be supported by providing near real-time responsiveness between capturing the data and driving an action. Ryft, as an example, talked about brick-and-mortar retail stores that needs to correlate data about the shopper as fast as possible while he is in the store, in order to send him a relevant coupon.<br />
Also, edge compute processing is required for handling intermittent connectivity from decentralized data capture. Edge compute must help filter signal from the noise in order to optimize bandwidth and responsiveness. Edge nodes should have the ability to receive updates from the cloud about what to filter out and changing definition of anomalies or data of interest.<br />
<br /></div>
<div>
<h4 style="text-align: left;">
Data Direction</h4>
<div>
The direction of the data generally needs to be bidirectional in an IoT system: from the edge nodes, to the centralized cloud for aggregation, and back out. Not all IoT frameworks offer this at the moment I believe.</div>
<div>
<br /></div>
<h4 style="text-align: left;">
Latency</h4>
<div>
Ryft emphasized the standard question of "what is the definition of near-real time?"; in other words, near-real time can mean a couple of ms to 5 minute response, according to the use case definition. Capital One mentioned that they had a throughput of about 2,000 events/second, which may sound small in the context of Big Data, but is not when you take into consideration that these events becomes features in a Machine learning model, that grow to 10M data dimensions to deal with. The Capital One team was looking for a latency of sub-40ms in their real-time fraud analysis use case.</div>
<h4 style="text-align: left;">
The importance of Machine learning</h4>
</div>
<div>
Microsoft's talk was really about Machine learning. The analytic output is no longer business intelligence (BI) based aimed at human consumption, but increasingly machine learning for near real-time performance that optimizes the output of an ecosystem of smart devices. The new pipelines of data need to support predictive analytics and machine learning, with a flow back to the source devices, in order to optimize how an ecosystem of connected devices operates. The speaker approached how to choose the proper ML algorithm to choose with 4 questions:<br />
1/ <b>How much?</b> I.e. "What will be the total of sales next week?" aka regression algorithm. <br />
2/ <b>Which category?</b> I.e. "Is is a cat or a dog?" aka classification algorithm.<br />
3/ <b>Which groups?</b> I.e. "Which shoppers have similar tastes?" aka clustering.<br />
4/ <b>Is it weird?</b> I.e. "Is this pressure unusual?" aka anomaly detection.<br />
5/ <b>Which action to take?</b> I.e. "Should I brake or accelerate in response to that yellow light?" aka reinforcement learning.<br />
<br />
Challenges:<br />
- The algorithm assumes the world doesn't change.<br />
- Sensors/actors could themselves change.<br />
- Reinforcement algorithm doesn't handle keeping goals.<br />
- Typically the algorithm takes a lot of time to learn.<br />
- Also, it doesn't always scale.<br />
<br />
The speaker postulated that this is a major problem in IoT: we have all the components like the actuators/sensors, but are still missing the central brain to make this work.<br />
<br /></div>
<div>
More generally, Machine Learning was a major addition in most IoT systems described in talks, allowing to reduce the need for manual rules, allowing advanced predicitve analysis. An interesting point was that ML allowed to pinpoint slow changes over time, as opposed to "simple" anomalies (mentioned by Intel). The way to do this being in looking at a combination (as opposed to a single) of sensors.<br />
<br />
<h4 style="text-align: left;">
Systems evolution</h4>
The systems built for IoT usually increase in complexity as they become more mature: they go from being able to visualize what is happening with the connected objects, to simple alerting, to complex rules, to predictive and then prescriptive analytics. The best systems (a la Microsoft) expose a model-as-a-service, ready to deploy. Intel talked about a marketplace of components in their solution, acting as an analytics toolkit.</div>
<h3 style="text-align: left;">
Deep dive into architectures</h3>
<div>
<div style="text-align: left;">
<span style="font-family: inherit;">Ted Dunning emphasized that <span style="background-color: white; line-height: 25px;">developers need a reliable way to move data as it is generated across different systems, one event at a time; this is generally done via Kafka. Kafka was shown present in almost all IoT architectures diagrams.</span></span><br />
<br /></div>
</div>
<div>
<span class="larger" style="background-color: white; box-sizing: border-box; font-size: 16px; text-align: center;"><span style="font-family: inherit;">However the real-time processing frameworks differed widely across companies; Intel's Moty was a big proponent of Akka. for bidirectionnal communication between devices and the central processing, while Capital One's Ganelin preferred Apache Apex. He said they used some kind of "scientific method" approach to actually come to this conclusion, which is refreshing. Their criteria for a real-time processing platform were:</span></span><br />
<span class="larger" style="background-color: white; box-sizing: border-box; font-size: 16px; text-align: center;"><span style="font-family: inherit;">- <b>Performance</b>: under their specific conditions, they needed sub-40ms latency.</span></span><br />
<span class="larger" style="background-color: white; box-sizing: border-box; font-size: 16px; text-align: center;"><span style="font-family: inherit;">- <b>Roadmap</b>: they evaluated future roadmap of the product. For example, Databricks has said that Spark Streaming may consider non-micro batching in a future version..</span></span><br />
<span class="larger" style="background-color: white; box-sizing: border-box; font-size: 16px; text-align: center;"><span style="font-family: inherit;">- <b>Community: </b>community support had to be strong, development cannot happen in a vacuum.</span></span><br />
<span class="larger" style="background-color: white; box-sizing: border-box; font-size: 16px; text-align: center;"><span style="font-family: inherit;">- <b>Enterprise readiness</b>: the framework of choice had to support enterprise features, like security.</span></span><br />
<span class="larger" style="background-color: white; box-sizing: border-box; font-size: 16px; text-align: center;"><span style="font-family: inherit;"><br /></span></span>
<span class="larger" style="background-color: white; box-sizing: border-box; font-size: 16px; text-align: center;"><span style="font-family: inherit;">With that, Ganelin quickly listed what went wrong with frameworks other than Apex:</span></span><br />
<span class="larger" style="background-color: white; box-sizing: border-box; font-size: 16px; text-align: center;"><span style="font-family: inherit;">- <b>Spark streaming</b> is a non-starter as it uses micro-batching and thus is too slow.</span></span><br />
<span class="larger" style="background-color: white; box-sizing: border-box; font-size: 16px; text-align: center;"><span style="font-family: inherit;">- In <b>Storm</b>, lack of scalability (non elastic nodes), failure handling not well supported, at least-once processing guarantees, non dynamic topologies. Acknowledgements are sent from spout (source) to sink, which works well until a failure occur: in that case the rollback starts from the last good tuple, which delays the new data, and creates cascading failures. Twitter doesn't use Storm anymore, and created Heron (different talk at the same time at Strata :-( ). Also the community support seem to be waning, even though Hortonworks says they will support Storm.</span></span><br />
<span class="larger" style="background-color: white; box-sizing: border-box; font-size: 16px; text-align: center;"><span style="font-family: inherit;"><br /></span></span>
- <b>Flink</b>: has added usability to Storm and Spark Streaming. Failure handling doesn't use acknowledgements, but checkpointing instead, which works better (like Spark Streaming), and keeps stateful snapshots. It also has exactly once processing guarantees. However it still lacks dynamic topologies, and is a young project with a lack of support at this point (April 2016).<br />
<br />
- <b>Apache Apex</b> on the other hand, is a real-time computation system based on YARN. Operability is a first-class citizen, with Enterprise features.It supports dynamic topology modifications and deployments. It is the only project that has durable messaging queues between operators, checkpointed in memory as well as on disk. It supports dynamic scaling, where you choose between latency, throuput.<br />
<span class="larger" style="background-color: white; box-sizing: border-box; font-size: 16px; text-align: center;"><span style="font-family: inherit;"><br /></span></span></div>
Intel developped a home-grown IoT system made out of open source components.<br />
Interestingly Mody from Intel says, their customers pushed back on cloud-based systems and instead insisted on on -premise implementations. What made implementation a breeze was to take advantage of a modular architecture comprised of Docker and Core OS, which made the application very portable across customer IT infrastructure. These enabled the duplication of components in architecture: Mody called it a smart data pipe IoT platform..<br />
Akka, mentioned earlier, was also a major benefit in their system, allowing for back pressure using reactive streams. Akka being the highly concurrent application framwork, for micro-service oriented architectures.<br />
Intel having started over 2 years ago, they incorporated Spark Streaming only later, and implement their rules system, set up via a self-service UI.<br />
Another talk refered to the SMACK architecture, aiming to replace Lambda architectures.<br />
<br />
Downstream out-of-order processing is necessary if buffering prevents completely sequential delivery of sensor data. Hence the importance of frameworks like Flink.<br />
<br />
<h4 style="text-align: left;">
Data format</h4>
JSON format has emerged as the preferred way to represent IoT data, as it's flexible for the variety of machine-generated information, and because it comes with a description of its own structure. New data stores generally work well with JSON (HBase, Cassandra, Dynamo DB, Mongo).<br />
<h4 style="text-align: left;">
Security</h4>
New security model must enable authentication and authorization of devices and encryption, according to Intel. However Ryft says, security is almost always talked about but actually not really implemented in practice.<br />
<br />
<br />
Anyway, thanks to the speakers for these enlightning talks!<br />
<br /></div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com0San Jose, CA, USA37.3382082 -121.8863286000000136.933999199999995 -122.53177560000002 37.7424172 -121.24088160000001tag:blogger.com,1999:blog-170648781806274754.post-25848838719778347922016-02-25T13:45:00.000-08:002016-02-25T13:53:58.666-08:0050 shades of Spark Streaming<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="border-bottom: solid #4F81BD 1.0pt; border: none; mso-border-bottom-themecolor: accent1; mso-element: para-border-div; padding: 0in 0in 4.0pt 0in;">
<div class="MsoTitle">
<br /></div>
</div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="font-family: "calibri"; font-size: 14.0pt;"><span style="mso-tab-count: 1;"> </span>I
want to share my recent experience with a Spark Streaming implementation. My
(poor) attempt at a funny post title aims to convey that there are a lot of
nuances in how to work with a streaming application that need to be thought
through, and I want to describe this in my example. <o:p></o:p></span></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="font-family: "calibri"; font-size: 14.0pt;">My streaming use case is from a medical device
company that produces implantable mobile cardiac telemetry wearables.
These devices generate sensor data, which are collected, processed and
stored. This data is continuously streaming and needs to be analyzed and stored
in near real-time. </span><span style="font-family: "arial"; font-size: 13.0pt;">During
the screening and the monitoring period various body vitals, like heartbeat,
are obtained for each patient being monitored. Average heartbeat value needs to
be calculated for each patient for each stage (a fixed period of time). <o:p></o:p></span></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="font-family: "arial"; font-size: 13.0pt;">The
overall data needs to be appropriately persisted into a data store, but also
coexisting with a front-end layer in order to give search and visualization
access to business users. Also we need to use the proper Spark Streaming API to
properly compute this. We will review these components one by one.</span><span style="font-family: "times"; font-size: 10.0pt;"><o:p></o:p></span></div>
<h1>
Storage component</h1>
<h2>
Choice of architecture<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="font-family: "arial"; font-size: 13.0pt;"><span style="mso-tab-count: 1;"> </span>Firstly, an interesting tidbit of
information is that the sensor data from patients is sent to a central NAS
server. This data gets stored into files of the same fixed size on the server
every hour. This is a legacy system that should probably be replaced by a
distributed messaging system like Kafka in the future. <o:p></o:p></span></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="font-family: "arial"; font-size: 13.0pt;">We first
considered a batch script to process these data files from the NAS, running for
example Hadoop. However in understanding more about the future considerations
of this application with our customer, it appeared that the hourly frequency file
rate was arbitrary, and as more patients would ramp up, the system would need
to keep up with the data volume and increase the file generation frequency. So
instead of a batch data processing system, we made the choice of using Spark
Streaming for this architecture; I believe this is typical of the </span><a href="http://radar.oreilly.com/2015/08/the-world-beyond-batch-streaming-101.html"><span style="font-family: "arial"; font-size: 13.0pt;">current trend that blend more and
more</span></a><span style="font-family: "arial"; font-size: 13.0pt;"> the batch, or
bounded data processing, with the unbounded data processing, essentially seeing
the existing hourly batching requirements as a special case of a streaming
architecture. So Spark Streaming was chosen instead to satisfy the
requirements.<o:p></o:p></span></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="font-family: "arial"; font-size: 13.0pt;">Spark
makes it easy to get data files from a folder:<o:p></o:p></span></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "monaco"; font-size: 11.0pt;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;">val sc = new SparkContext(sparkConf)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>val ssc = new
StreamingContext(sc, Seconds(batchSize))<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>val lines =
ssc.textFileStream("file://spark/test/data/")<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
</div>
<h2>
Search layer<o:p></o:p></h2>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="font-family: "arial"; font-size: 13.0pt;">The next
piece is to store the data in near real time. A search/indexing system is a
good choice.<o:p></o:p></span></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="font-family: "arial"; font-size: 13.0pt;">Elastic
Search is a great component for this, and the integration point with Spark is </span><a href="https://www.elastic.co/guide/en/elasticsearch/hadoop/master/spark.html"><span style="font-family: "arial"; font-size: 13.0pt;">well documented</span></a><span style="font-family: "arial"; font-size: 13.0pt;">. So post Elastic Search configuration
within Spark, our index-saving code looks like the following:<o:p></o:p></span></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>trialRecordDStream.foreachRDD(lineRDD
=><span style="mso-spacerun: yes;"> </span><o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>{<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-tab-count: 1;"> </span>EsSpark.saveToEsWithMeta(lineRDD,
"trials/trialdata") <o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>}<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>)<o:p></o:p></span></div>
</div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<h2>
Visualization<o:p></o:p></h2>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="font-family: "arial"; font-size: 13.0pt;">And
finally we can put </span><a href="https://www.elastic.co/products/kibana"><span style="font-family: "arial"; font-size: 13.0pt;">Kibana</span></a><span style="font-family: "arial"; font-size: 13.0pt;"> as a visualization layer on top of
Elastic Search to represent the results. The overall architecture looks like
the below.<o:p></o:p></span></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="font-family: "arial"; font-size: 13.0pt;"><!--[if gte vml 1]><v:shapetype id="_x0000_t75" coordsize="21600,21600"
o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f"
stroked="f">
<v:stroke joinstyle="miter"/>
<v:formulas>
<v:f eqn="if lineDrawn pixelLineWidth 0"/>
<v:f eqn="sum @0 1 0"/>
<v:f eqn="sum 0 0 @1"/>
<v:f eqn="prod @2 1 2"/>
<v:f eqn="prod @3 21600 pixelWidth"/>
<v:f eqn="prod @3 21600 pixelHeight"/>
<v:f eqn="sum @0 0 1"/>
<v:f eqn="prod @6 1 2"/>
<v:f eqn="prod @7 21600 pixelWidth"/>
<v:f eqn="sum @8 21600 0"/>
<v:f eqn="prod @7 21600 pixelHeight"/>
<v:f eqn="sum @10 21600 0"/>
</v:formulas>
<v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/>
<o:lock v:ext="edit" aspectratio="t"/>
</v:shapetype><v:shape id="Picture_x0020_2" o:spid="_x0000_i1025" type="#_x0000_t75"
style='width:6in;height:266pt;visibility:visible;mso-wrap-style:square'>
<v:imagedata src="file://localhost/Users/mlieber/Library/Caches/TemporaryItems/msoclip/0/clip_image001.png"
o:title=""/>
</v:shape><![endif]--><!--[if !vml]--><img border="0" height="268" src="file://localhost/Users/mlieber/Library/Caches/TemporaryItems/msoclip/0/clip_image002.png" v:shapes="Picture_x0020_2" width="434" /><!--[endif]--></span><span style="font-family: "arial"; font-size: 13.0pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h1>
Spark streaming specifics<o:p></o:p></h1>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;"><span style="mso-tab-count: 1;"> </span>The next piece was to design various
computations on the data to be calculated in near-real time, like
maximum/minimum and mean for a given patient, on these biometrics.
Maximum/Minimum is pretty trivial (you just need to keep the last seen
biggest/smallest value), but calculating average requires an aggregation (you
need to retain all of the values in order to compute, not just the last one). <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;">This
requires some thoughts as to how the data is streaming into your application:
does the result of your calculation have to be bounded in a certain window of
time and discarded afterwards? Or do you need to keep this calculation current
at all times?<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;">In our
case, the use case implied the latter. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;">A first
approach for calculating an aggregation would be to collect the values for a
certain patient over time, store them all in to the storage layer, then read
them back every time for aggregation. Unfortunately this approach is wrong in
two ways: first, resources are wasted on sending the value to store then
retrieve them back, second, this approach does not scale when the data is to
big to recollect. <o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;">Instead,
Spark Streaming has an API that offers a more elegant solution: </span><a href="http://spark.apache.org/docs/latest/streaming-programming-guide.html"><span style="font-family: "arial"; font-size: 13.0pt;">updateStateByKey</span></a><span style="font-family: "arial"; font-size: 13.0pt;">(), to calculate stateful aggregations.
By applying the given function to the new value as well as the previous state for
what you are calculating, the current calculation state is maintained. Also, the
data from previous states is saved to disk for remediating potential failures,
in a process called </span><a href="http://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing"><span style="font-family: "arial"; font-size: 13.0pt;">checkpointing</span></a><span style="font-family: "arial"; font-size: 13.0pt;">. However, a number of downsides
come with this: apparently every time this is called, the entire dataset is
retained in memory, which leads to scalability issues. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;">So now a
more scalable solution with Spark 1.6 is the new mapWithState() API, which only
retains the delta/ net-new data, which is great. If you look at <a href="https://docs.cloud.databricks.com/docs/spark/1.6/examples/Streaming%20mapWithState.html">the
example</a>, you’ll also see that the new API also lets you add an initial state
to the RDD you are processing, and have a mechanism for timeout.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;">In our
case, we are calculating the average for each record, which is a data
collection of what represents a unique record, in our case a combination of a
patient id, and visit. Our value is a SummaryRecord, defined as below:<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>case class
SummaryRecord(private var _patientId: String, private var _visitName:String,
private var _max:Int, private var _min:Int, private var _mean:Int)<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;">Note that
we store our aggregation values in this SummaryRecord, for each
PatientId_VisitName which is our key. There are a number of different ways to
design these aggregations, but this way we have one clean (K,V) pair at the
end, that we can directly store into our datastore.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;">So, first,
we group the records we want to aggregate together by our key:<o:p></o:p></span></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;">var summaryRecordDStream: DStream[(String, Option[SummaryRecord])]
= trialRecordDStream.map( trialRecordTuple =><o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>{<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>//Generate the summary
records pair from study trial records dstream<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>val trialRecord =
trialRecordTuple._2;<span style="mso-spacerun: yes;"> </span><o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>val summaryRecord =
new SummaryRecord(trialRecord.trialId, trialRecord.patientId,
trialRecord.visitName, trialRecord.value<span style="mso-spacerun: yes;">
</span>);<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>val key:String =
trialRecord.trialId + "_"+ trialRecord.patientId + "_" +
trialRecord.visitName + "_" //+ trialRecord.device <o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>println("Summary
Key: " + key)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>(key, Some(summaryRecord));<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-tab-count: 2;"> </span><o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>}<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>//Group the summary
dstream by key for aggregation<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>val
summaryGroupedDStream: DStream[(String,Iterable[Option[SummaryRecord]])] =
summaryRecordDStream.groupByKey();<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span><o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;">Then we
call our mapWithState() function which calls in turn the groupSummaryRecords
function:<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>//Calling mapWithState
to maintain the state of the summary object<span style="mso-spacerun: yes;">
</span>- only from the stream<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>var stateSpec = StateSpec.function(groupSummaryRecords2
_)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span><o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>var
summaryRecordUpdatedDStream =<span style="mso-spacerun: yes;">
</span>trialRecordDStream.mapWithState(stateSpec)<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;">This returns
a summaryRecord instance, where the average is calculated for its given key.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;">Below is
the gist of the function implementation that we will pass to mapWithState().<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;">The
signature (this is Scala) looks a bit confusing at first; batchtime, the
key/value pair (our key being one entity of a unique Patient), value as a Trial
record for that patient, the state which is the updated state of our SummaryRecord
over time, and the function returns an array of a pair of the same key
associated with the new SummaryRecord for that key.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;">In the
function, we essentially add the new value with the previous sum, augment the
count, and store everything into our intermediary summary record class.</span><span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><o:p></o:p></span></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;">def groupSummaryRecords2(batchTime: Time, key: String, value1:
Option[TrialRecord],<span style="mso-spacerun: yes;">
</span>optionSummary:State[SummaryRecord]):Option[(String, SummaryRecord)] = {<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span><o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-tab-count: 2;"> </span>var
min=Integer.MAX_VALUE;<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-tab-count: 2;"> </span>var max=0;<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-tab-count: 2;"> </span>var total=0;<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>val sum =
value1.map(_.value )<span style="mso-spacerun: yes;"> </span><o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>val sum2:
Option[Int] = optionSummary.getOption.map(_.sum) <o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>// just summing 2
option traits<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>val sum3:
Option[Int] = (sum :: sum2 :: Nil).flatten.reduceLeftOption(_ + _) <o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span><o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>if (optionSummary.exists)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>{<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>if
(value1.get.value > optionSummary.get.max)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>max =
value1.get.value<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>if
(value1.get.value < optionSummary.get.min)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>min =
value1.get.value<span style="mso-spacerun: yes;"> </span><o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>val
intermediaryOutput = new SummaryRecord(value1.get.trialId,<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;">
</span>value1.get.patientId ,<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;">
</span>value1.get.visitName,<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>max,<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>min, <o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;">
</span>optionSummary.getOption.get.count+1, <o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span><span style="mso-spacerun: yes;"> </span>sum3.getOrElse(0).asInstanceOf[Int])<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>val output = (key,
intermediaryOutput)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span><o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;">
</span>optionSummary.update(intermediaryOutput)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;">
</span>println("updated output: " + key.toString() + " : "
+ intermediaryOutput.patientId.toString() + intermediaryOutput.sum)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>Some(output)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none 1.0pt; color: #333333; font-family: "monaco"; font-size: 10.0pt; padding: 0in;"><span style="mso-spacerun: yes;"> </span>}<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;">Notice that
in this code, we are always recomputing the mean, for every new sequence of
data that comes in.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;">Once we are
done, we store the raw data (Trial data) and summary data into ES, and later on
can visualize the results with Kibana.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;">This is
just one example of a transformation that Spark Stream offers. <span style="mso-spacerun: yes;"> </span>The complete code is at <a href="https://github.com/mlieber/sparkstreaming-es">https://github.com/mlieber/sparkstreaming-es</a>
. There are a number of other helper functions that Spark Streaming has in
store as part of its API. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "arial"; font-size: 13.0pt;">As always,
please let me know if any questions about this.<o:p></o:p></span></div>
<div class="MsoNormal">
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgljIW936PmDq1jGTvoT-gUssRUea-x0HnPh5kFog_mmsxri96794ROa_sNwCn9W9RzxREQg6DS4fYuYaRTf_NA_IP37ZGOLRY_cJy21Fn-H_M1SuN8IE0xLo73i40yhhGFOosu6eiJ644w/s1600/Screen+Shot+2015-10-12+at+12.50.38+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="181" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgljIW936PmDq1jGTvoT-gUssRUea-x0HnPh5kFog_mmsxri96794ROa_sNwCn9W9RzxREQg6DS4fYuYaRTf_NA_IP37ZGOLRY_cJy21Fn-H_M1SuN8IE0xLo73i40yhhGFOosu6eiJ644w/s320/Screen+Shot+2015-10-12+at+12.50.38+PM.png" width="320" /></a></div>
<br /></div>
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>1515</o:Words>
<o:Characters>8638</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>71</o:Lines>
<o:Paragraphs>20</o:Paragraphs>
<o:CharactersWithSpaces>10133</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com0tag:blogger.com,1999:blog-170648781806274754.post-47535906263321049202015-09-24T09:33:00.002-07:002015-09-24T09:33:39.851-07:00Apache Hive: the SQL Count of Monte Cristo<div dir="ltr" style="text-align: left;" trbidi="on">
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>481</o:Words>
<o:Characters>2744</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>22</o:Lines>
<o:Paragraphs>6</o:Paragraphs>
<o:CharactersWithSpaces>3219</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<div class="MsoNormal">
<br /></div>
<div style="border-bottom: solid #4F81BD 1.0pt; border: none; mso-border-bottom-themecolor: accent1; mso-element: para-border-div; padding: 0in 0in 4.0pt 0in;">
<div class="MsoTitle">
Apache Hive: the SQL Count of Monte Cristo<o:p></o:p></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Project<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-tab-count: 1;"> </span>I have been
working on an industrial-type Hive project at a major Fortune 50 company
lately, and wanted to share my experience. In some ways, I believe my
experience is representative of what is going on as far as the current Hadoop
adoption is concerned: the environment I was in is nor a bleeding edge startup
trying all possible new tools out there, nor the technology lagger typical from
the Mid-west companies.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h2>
Technical environment<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
First, the state of the work environment I was in: <o:p></o:p></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Hadoop environments: adopted a major Hadoop
vendor. Typical Development / Production separated clusters, of several 100
nodes. <o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Hadoop 1.x installed ; however, not on the
latest build: using Hive/Pig .13 on Map Reduce 1.0, i.e. no YARN. Also, no
Spark in sight..<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Heavy users of Hive and Pig, no custom Map
Reduce. Java shop, with a little big of Python. Installed Datameer and
Platfora, and evaluating other tools, like Alation. Not going to the cloud
anytime soon, very concerned about data security.<o:p></o:p></div>
<div class="MsoListParagraphCxSpLast" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Data analysts not very aware or curious of
market tools. More about SQL than Map Reduce. So essentially the work was no
longer about Big Data, but more about translating the requirements to technical
specs in a correct manner; all optimization techniques and specifics about
Hadoop being deferred to the Map Reduce platform (i.e. using default parameters
in place).<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h2>
Problems<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I encountered a few technical issues, and wanted to note
this, since it can be a common occurence.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I encountered an issue with a complex query involving a
few joins, that was overwriting on top of itself (‘INSERT OVERWRITE TABLE A …
SELECT * FROM A ..’). The issue I got was that there were an intermittent
problem that sometimes gave me a cryptic error upon overwriting (the SELECT
query part was running fine). Coincidently my colleague also encountered a
similar problem with Pig, where the query refused to run to completion. <o:p></o:p></div>
<div class="MsoNormal">
The solution to this was to save the result of the query in
a temporary table/set of tuples, and then save it back to the actual table
afterwards. I.e. : <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div style="background: #F8F8F8; border: solid #CCCCCC 1.0pt; margin-left: .25in; margin-right: 0in; mso-border-alt: solid #CCCCCC .75pt; mso-element: para-border-div; padding: 5.0pt 8.0pt 5.0pt 8.0pt;">
<div class="MsoListParagraphCxSpFirst" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">-- to remedy a bug!<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">drop table mytable_temp;<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">create table IF NOT EXISTS
mytable_temp (<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">.. – same schema as my actual table
giving issues.<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">)<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"> ROW FORMAT DELIMITED<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">FIELDS TERMINATED BY '\001'<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">ESCAPED BY '\n'<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">LINES TERMINATED BY '\n'<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">STORED AS TEXTFILE<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpLast" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">;<o:p></o:p></span></div>
</div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Monaco;">-- Now run the actual query , inserting into our
temp table<o:p></o:p></span></div>
<div style="background: #F8F8F8; border: solid #CCCCCC 1.0pt; margin-left: .25in; margin-right: 0in; mso-border-alt: solid #CCCCCC .75pt; mso-element: para-border-div; padding: 5.0pt 8.0pt 5.0pt 8.0pt;">
<div class="MsoListParagraphCxSpFirst" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">insert into table card_member_temp<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">select …<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"><o:p> </o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">-- to remedy the bug ..<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">insert overwrite table mytable<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpLast" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">select * from mytable_temp;<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Best Practices<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Some best practices I learned from working with and churning
a lot of code: <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Try not to pass variables directly in the code,
but rather upstream from your scheduler of choice:<o:p></o:p></div>
<div style="background: #F8F8F8; border: solid #CCCCCC 1.0pt; margin-left: .25in; margin-right: 0in; mso-border-alt: solid #CCCCCC .75pt; mso-element: para-border-div; padding: 5.0pt 8.0pt 5.0pt 8.0pt;">
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: black; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #3f7f7f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"><hdp:hive-server</span><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"> </span><span style="color: #7f007f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">host</span><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">=</span><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">"some-host"</span><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"> </span><span style="color: #7f007f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">port</span><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">=</span><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">"10001"</span><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"> </span><span style="color: #7f007f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">properties-location</span><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">=</span><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">"classpath:hive-dev.properties"</span><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"> </span><span style="color: #7f007f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">configuration-ref</span><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">=</span><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">"hadoopConfiguration"</span><span style="color: #3f7f7f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">></span><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"><o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: black; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"><span style="mso-spacerun: yes;"> </span>someproperty=somevalue<span style="mso-spacerun: yes;"> </span>hive.exec.scratchdir=/tmp/mydir<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: black; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #3f7f7f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"></hdp:hive-server></span><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"><o:p></o:p></span></div>
</div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Separate the Hive clauses (SELECT, FROM) from
the variable names (typically on a different line) for readability ; same as in
SQL<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Ensure that after any manipulation of a column,
you give it its name, that is qualify the column; i.e. below; you might get
away with not doing it, but I encountered some cryptic issues in UNIONs because
I hadn’t declared all variables.<o:p></o:p></div>
<div style="background: #F8F8F8; border: solid #CCCCCC 1.0pt; margin-left: .25in; margin-right: 0in; mso-border-alt: solid #CCCCCC .75pt; mso-element: para-border-div; padding: 5.0pt 8.0pt 5.0pt 8.0pt;">
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">COALESCE(n.vendor_info_id,
f.OPEN_Customer) AS OPEN_Customer<o:p></o:p></span></div>
</div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Some say that they don’t “trust” Hives’ built-in
functions, and would rather just declare Hive tables and work within Pig (via
HCatalog) for the most part. Some other engineers do work in Hive QL, but for
anything complicated, would rather go to UDF/UDTFs in a different language. I
don’t really agree with these views, and was pleased to see that most if not
all of my requirements could be done in pure Hive QL. Take a look at this for
example:<o:p></o:p></div>
<div style="background: #F8F8F8; border: solid #CCCCCC 1.0pt; margin-left: .25in; margin-right: 0in; mso-border-alt: solid #CCCCCC .75pt; mso-element: para-border-div; padding: 5.0pt 8.0pt 5.0pt 8.0pt;">
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"><o:p> </o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">-- Lets only take 1 unique record
for each account_id when there are multiple, we only need one .<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">-- we take whichever. <o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">insert into table memberdata<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">select T.cm_id, T.customer_id,
T.record_id, T.account_id <o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">from<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">(<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">select <o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"> row_number() over (partition
by n.account_id order by n.record_id) as RANK,<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">
reflect("java.util.UUID", "randomUUID") AS cm_id,<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"> n.cus_id AS customer_id,<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"> n.record_id AS record_id,<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"> n.account_id as account_id<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">from datasource n<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">) T<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">where T.rank = 1<o:p></o:p></span></div>
</div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #f2f2f2; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Monaco;">;</span>The code above uses the
Row_number function to essentially get a value out of many repeated ones in a
particular column, retaining the rest of the data, by only getting the first
value found (rank = 1). A unique id is then generated for the primary key for
this record via the UUID Java code.<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->As a gotcha from the last tidbit of code : do
not attempt to use the Hash algorithm (Hash() function) in Hive: it yields to
collision very rapidly (after a few 100 rows).<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Another example:<span style="color: #f2f2f2; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Monaco;"> example:</span><o:p></o:p></div>
<div style="background: #F8F8F8; border: solid #CCCCCC 1.0pt; margin-left: .25in; margin-right: 0in; mso-border-alt: solid #CCCCCC .75pt; mso-element: para-border-div; padding: 5.0pt 8.0pt 5.0pt 8.0pt;">
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">IF ( n2.final_spend is not null and
opp.frequency is not null AND opp.average_transaction_size is null AND
(vil.vendor is not null AND vil.stage <> 'SPEND FULFILLED'),<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"> CASE opp.frequency
WHEN 'ANNUAL' THEN n2.final_spend / 1<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">
WHEN 'SEMI-ANNUAL' then n2.final_spend / 2<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">
WHEN 'QUARTERLY' THEN n2.final_spend / 4<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">
WHEN 'BI-MONTHLY' THEN n2.final_spend / 6<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">
WHEN 'MONTHLY' THEN n2.final_spend / 12<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">
WHEN 'BI-WEEKLY' THEN n2.final_spend / 26<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">
WHEN 'WEEKLY' THEN n2.final_spend / 52<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">
WHEN 'DAILY' THEN n2.final_spend / 365 END,<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">
opp.average_transaction_size) as average_transaction_size</span><o:p></o:p></div>
</div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->When joining on multiple tables at once, the
order of the joins counts, as stipulated in the doc (“<span style="background: white; color: #333333; font-family: Arial; font-size: 10.5pt; mso-fareast-font-family: "Times New Roman";">Joins are NOT commutative! “https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
). </span>So either the order has to follow LEFT JOINs first, followed by INNER
JOINs, or as an alternative use subqueries:<o:p></o:p></div>
<div style="background: #F8F8F8; border: solid #CCCCCC 1.0pt; margin-left: .25in; margin-right: 0in; mso-border-alt: solid #CCCCCC .75pt; mso-element: para-border-div; padding: 5.0pt 8.0pt 5.0pt 8.0pt;">
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">Select ..<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">FROM<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">( select<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">n.vendor_id, n.location_id,
n.cm_id, inc.final_spend as final_spend, <o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"> endor.endorsement_id as
endorsement_id, inc.ap_file_id,<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"> pers.person_id as person_id<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">from<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">myrecords inc<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">-- there should always be a match,
so no OUTER JOIN<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">JOIN<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">(select o.cm_id, v.record_id, v.vendor_id,
v.location_id from rim v<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">join members o<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">on v.record_id = o.record_id) n<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">ON inc.record_id = n.record_id<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">LEFT OUTER JOIN
vendor_contact_person pers<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">ON<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">-- for checking if this vendor has
an endorsement that exists, for 'top vendor flag'<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">LEFT OUTER JOIN<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">Endorsement endor<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">on (n.vendor_id = endor.vendor_id
and n.location_id = endor.location_id and n.cm_id = endor.cm_id)<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">) n2<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">FULL OUTER JOIN <o:p></o:p></span></div>
<div class="MsoListParagraphCxSpLast" style="background: #F8F8F8; border: none; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #CCCCCC .75pt; mso-list: l0 level1 lfo1; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #2a00ff; font-size: 10.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">..<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
As you can see the FROM clauses queries against a table
composed of different JOINs, called n2.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h2>
UDFs and UDTFs<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-tab-count: 1;"> </span>A coworker
had to create a UDTF (UDF that generates a table) and chose Python to do so; I
hadn’t realized that a UDF/UDTF can be written in pretty much any language now
for Hive, as long as it uses the constraints of Hadoop Streaming – the program
must essentially be able to be its own mapper or reducer in the pipeline,
writing its output or accepting input within the Hadoop constraints. However be
warned that performance won’t be as good as writing in Java.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Workflow for batch processing<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-tab-count: 1;"> </span>There are a
few options for this, mentioned everywhere. On this project we used Spring Batch,
which was just essentially a wrapper in xml for the scripts that we were
writing, either in Pig, Hive or shell scripts (called scriptlets). It is of the
form:<o:p></o:p></div>
<div style="background: #F8F8F8; border: solid #CCCCCC 1.0pt; mso-border-alt: solid #CCCCCC .75pt; mso-element: para-border-div; padding: 5.0pt 8.0pt 5.0pt 8.0pt;">
<div class="MsoNormal" style="background: #F8F8F8; border: none; mso-border-alt: solid #CCCCCC .75pt; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt;">
<span style="color: #3f7f7f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"><hdp:hive-tasklet</span><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"> </span><span style="color: #7f007f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">id</span><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">=</span><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">"hive-script"</span><span style="color: #3f7f7f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">><o:p></o:p></span></div>
<div class="MsoNormal" style="background: #F8F8F8; border: none; mso-border-alt: solid #CCCCCC .75pt; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt;">
<span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"><span style="mso-spacerun: yes;"> </span></span><span style="color: #3f7f7f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"><hdp:script></span><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">myHivescript.hql</span><span style="color: #3f7f7f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"></hdp:script><o:p></o:p></span></div>
<div class="MsoNormal" style="background: #F8F8F8; border: none; mso-border-alt: solid #CCCCCC .75pt; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt;">
<span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"><span style="mso-spacerun: yes;"> </span></span><span style="color: #3f7f7f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"><hdp:script</span><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"> </span><span style="color: #7f007f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">location</span><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">=</span><span style="color: #2a00ff; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;">"classpath:org/company/hive/script.q"</span><span style="color: #3f7f7f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"> /><o:p></o:p></span></div>
<div class="MsoNormal" style="background: #F8F8F8; border: none; mso-border-alt: solid #CCCCCC .75pt; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt;">
<span style="color: #3f7f7f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"><span style="mso-spacerun: yes;"> </span><hdp:parameters></span><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"> </span><span style="font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Monaco;">set hive.metastore.warehouse.dir=/opt/hive/warehouse;<span style="color: #f2f2f2;">/warehouse;</span></span><span style="color: #3f7f7f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"><o:p></o:p></span></div>
<div class="MsoNormal" style="background: #F8F8F8; border: none; mso-border-alt: solid #CCCCCC .75pt; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt;">
<span style="color: #3f7f7f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"><span style="mso-spacerun: yes;"> </span></hdp:parameters><o:p></o:p></span></div>
<div class="MsoNormal" style="background: #F8F8F8; border: none; mso-border-alt: solid #CCCCCC .75pt; mso-padding-alt: 5.0pt 8.0pt 5.0pt 8.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt;">
<span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"><span style="mso-spacerun: yes;"> </span></span><span style="color: #3f7f7f; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"></hdp:hive-tasklet></span><span style="color: black; font-family: Courier; font-size: 10.0pt; mso-bidi-font-family: Courier;"><o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
What it essentially gives you is a way to control the flow
of the pipeline. It has built-in retrial of a tasklet on error, but 99% of the
time this is completely useless, as the cause of the error is because of the
data or the script, not the infrastructure itself since Hadoop already reruns
on a different node by default in case of failure. The same is to be said
of<span style="mso-spacerun: yes;"> </span>Hortonworks’ Oozie however. The other
pain was to have to comment out tasklets (in the config file in xml) in case we
wanted to test a subset of the tasklets, which is probably not the optimal way
to do this. <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h2>
Testing<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-tab-count: 1;"> </span>Which
brings the conversation to testing: after a few projects in Big Data batch
systems, I am still not aware of a good testing tool that alleviates some of
the pain in order to do system testing of Hadoop pipelines.<o:p></o:p></div>
<div class="MsoNormal">
From my experience, testing should be done 2-fold:<o:p></o:p></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->With a small subset of data, to test each script
against the business rules, and ensure each column returns the appropriate
value (functional testing).<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->With a larger data set, to test the counts of
records, ensuring JOINS and FILTERs as well as OVERWRITEs (in Hive tables).
Proper SELECT COUNT(primary_key) should be run on each table since no integrity
constraint is typically set up in Hive or Pig.<o:p></o:p></div>
<div class="MsoListParagraphCxSpLast" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Following this, results should be shared with a
business analyst that should be the SME on the data domain.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h2>
Tools<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-tab-count: 1;"> </span>Typically
on Big Data systems, the code has to be run and tested on the cluster itself,
there is no way (or very limited, like for ensuring the syntax of a function,
or when writing a UDF) to write code locally. The way we wrote code fell in two
categories:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
- Write code in Emacs/Vim! Directly on the cluster. This
works relatively well, but lacks any of the capacities of an IDE to correct the
syntax. <o:p></o:p></div>
<div class="MsoNormal">
- Use of the Sublime Text editor, with Hive/Pig syntax
checking, on local machines, with a plug-in to ftp the code directly to the
cluster. This proved relatively elegant and worked pretty well once installed.
The installation was not for the faint of heart however.. <o:p></o:p></div>
<div class="MsoNormal">
On a related note, in my numerous projects, I have
unfortunately never seen anyone use GUIs like Hue for writing Big Data code.
Which brings me to the fact that the data frame concept of Panda or Zeppelin is
really useful!<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>1480</o:Words>
<o:Characters>8440</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>70</o:Lines>
<o:Paragraphs>19</o:Paragraphs>
<o:CharactersWithSpaces>9901</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--></div>
<div class="MsoNormal">
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com0tag:blogger.com,1999:blog-170648781806274754.post-58576287030815466692015-07-13T22:24:00.002-07:002015-07-13T22:24:56.165-07:00Calculating the Customer lifetime value in Prediction.IO via a Python notebook & Hive<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">As
described on </span><a href="http://www.prediction.io/"><span style="background: white; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">their website</span></a><span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">, Prediction.io (PIO) integrates all the pieces
together needed to form a Machine learning engine platform: <o:p></o:p></span></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #333333; font-family: Arial; mso-fareast-font-family: Arial;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">A Machine learning Engine, built on top of the Spark ML Lib
library, that trains and evaluates predictive models;<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #333333; font-family: Arial; mso-fareast-font-family: Arial;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">A Query engine to serve the results;<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpLast" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #333333; font-family: Arial; mso-fareast-font-family: Arial;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">A data collection layer, called Event server.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Together
this forms a deployable, Production-ready machine learning platform.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">In this
post we will focus on the Event server component. This piece is essential to
the framework: the Event server is the place where data collection takes place and
on which analytics layer is built. In addition, it is highly scalable to
accommodate for Big Data use cases.</span><span style="font-family: Times; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">We will
first review what the Event server is good for, take a look at its architecture
and intrinsic data structure, and then dive into an exploratory analytics
example.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><br /></span></div>
<h2>
<span style="background: white;">What is the function of the Event Server?</span></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><span style="mso-tab-count: 1;"> </span>The Event server serves to store the
data later fed into the Machine-learning engine. It essentially acts as the
data repository of the PIO platform, and as such this is where all of your data
is unified together.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Following
the separation of concerns architecture, the Event server is disjointed from
the other PIO elements, and this is convenient because it acts as its own
independent tier and can be used as such. <o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>1657</o:Words>
<o:Characters>9451</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>78</o:Lines>
<o:Paragraphs>22</o:Paragraphs>
<o:CharactersWithSpaces>11086</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
table.MsoTableGrid
{mso-style-name:"Table Grid";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-priority:59;
mso-style-unhide:no;
border:solid windowtext 1.0pt;
mso-border-alt:solid windowtext .5pt;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-border-insideh:.5pt solid windowtext;
mso-border-insidev:.5pt solid windowtext;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<span style="background: white; font-family: Cambria; font-size: 12.0pt; mso-ansi-language: EN-US; mso-ascii-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-language: AR-SA; mso-bidi-theme-font: minor-bidi; mso-fareast-font-family: "MS 明朝"; mso-fareast-language: EN-US; mso-fareast-theme-font: minor-fareast; mso-hansi-theme-font: minor-latin;"><br clear="all" style="mso-special-character: line-break; page-break-before: always;" />
</span>
</div>
<div class="MsoNormal">
<br /></div>
<h3>
<span style="background: white;">Architecture overview</span></h3>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiqt4zqxVzxdpDeL7LyT7F9QxBr60cYf6iDtNNWaNqDu64id7qw3AbA3Qlbbbc6Gp6QsBQFPRKq5dY9Gk_MKXVcahw-3j4n-V3S_2lzApxapBpZRO9xHfF-eoZGUeFPiaV5fwfiqKCj-8Cz/s1600/pio1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="313" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiqt4zqxVzxdpDeL7LyT7F9QxBr60cYf6iDtNNWaNqDu64id7qw3AbA3Qlbbbc6Gp6QsBQFPRKq5dY9Gk_MKXVcahw-3j4n-V3S_2lzApxapBpZRO9xHfF-eoZGUeFPiaV5fwfiqKCj-8Cz/s640/pio1.png" width="640" /></a></div>
<br />
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman"; mso-no-proof: yes;"><!--[if gte vml 1]><v:shapetype id="_x0000_t75" coordsize="21600,21600"
o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f"
stroked="f">
<v:stroke joinstyle="miter"/>
<v:formulas>
<v:f eqn="if lineDrawn pixelLineWidth 0"/>
<v:f eqn="sum @0 1 0"/>
<v:f eqn="sum 0 0 @1"/>
<v:f eqn="prod @2 1 2"/>
<v:f eqn="prod @3 21600 pixelWidth"/>
<v:f eqn="prod @3 21600 pixelHeight"/>
<v:f eqn="sum @0 0 1"/>
<v:f eqn="prod @6 1 2"/>
<v:f eqn="prod @7 21600 pixelWidth"/>
<v:f eqn="sum @8 21600 0"/>
<v:f eqn="prod @7 21600 pixelHeight"/>
<v:f eqn="sum @10 21600 0"/>
</v:formulas>
<v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/>
<o:lock v:ext="edit" aspectratio="t"/>
</v:shapetype><v:shape id="Picture_x0020_1" o:spid="_x0000_i1029" type="#_x0000_t75"
style='width:6in;height:211pt;visibility:visible;mso-wrap-style:square'>
<v:imagedata src="file://localhost/Users/mlieber/Library/Caches/TemporaryItems/msoclip/0/clip_image001.png"
o:title=""/>
</v:shape><![endif]--><!--[if !vml]--><!--[endif]--></span><span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-bidi-font-size: 12.0pt; mso-fareast-font-family: "Times New Roman";">(from </span><a href="http://docs.prediction.io/datacollection/"><span style="background: white; font-family: Arial; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-bidi-font-size: 12.0pt; mso-fareast-font-family: "Times New Roman";">web site</span></a><span style="background: white; color: #333333; font-family: Arial; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-bidi-font-size: 12.0pt; mso-fareast-font-family: "Times New Roman";">)<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">By
default the Event server is built on top of </span><a href="https://www.blogger.com/hbase.apache.org"><span style="background: white; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Apache HBase</span></a><span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"> (although PIO can be
deployed on other NoSQL stores as well if needed). This allows for horizontal
scaling and near-real time storage and retrieval of the events data.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">The PIO
engine expects the events in a certain data structure. Conveniently, as a data
scientist/developer working with the PIO framework, you are not expected to
interact with HBase directly, but with the PIO API in the form of Http Requests
or the PIO API, in order to store the events in the PIO data structure; this is
documented fully </span><a href="http://docs.prediction.io/datacollection/eventapi/"><span style="background: white; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">here</span></a><span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">. Let’s review this in more details.<o:p></o:p></span></div>
<h2>
<span style="background: white;">How is the data stored?</span></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Essentially
the PIO data event structure is centered to collect any type of data
interaction. The structure is comprised of:<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: none; mso-border-alt: solid windowtext .5pt; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 1184;">
<tbody>
<tr>
<td style="border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221">
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">The <b>name</b> of the
event<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
</td>
<td style="border-left: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221">
<div class="MsoNormal">
A set of operations:<o:p></o:p></div>
<div class="MsoNormal">
- $set, to register the entity.<o:p></o:p></div>
<div class="MsoNormal">
- $unset, to remove the entity.<o:p></o:p></div>
<div class="MsoNormal">
- $delete, delete this entity.<o:p></o:p></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221">
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">The “<b>type</b>” of
entity being used</span><o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221">
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">What entity is represented? i.e. a user , order, object,
etc. </span><o:p></o:p></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221">
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">The type id<o:p></o:p></span></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221">
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">A unique id for this entity<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221">
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Another set of entity type and id, called <b>target entity</b>; optional</span><o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221">
<div class="MsoListParagraph" style="margin-left: -.9pt; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #333333; font-family: Arial; mso-fareast-font-family: Arial;">-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal;">
</span></span><!--[endif]--><span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Another entity that has a relationship with the entity
above (i.e. user-items)<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221">
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">The <b>properties</b>
associated to the entity or the event.</span><o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221">
<div class="MsoNormal" style="margin-left: -.9pt;">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">This is a set of key-value pairs. The
properties can be associated with the entity, or the event.<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-left: -.9pt;">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Example: <o:p></o:p></span></div>
<div class="MsoNormal" style="margin-left: -.9pt;">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Set the user’s information details like name,
gender, etc.<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-left: -.9pt;">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">or<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Set information about </span><span style="background: white; color: #222222; font-family: Arial; mso-fareast-font-family: "Times New Roman";">a
rate event, e.g. “{rating : 4}”. </span><span style="font-size: 10pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-left: -.9pt;">
<br /></div>
<div class="MsoNormal">
<br /></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221">
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">An optional <b>event
time</b>.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221">
<div class="MsoNormal">
<br /></div>
</td>
</tr>
</tbody></table>
<div class="MsoNormal">
<br /></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #333333; font-family: Arial; mso-fareast-font-family: Arial;">-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal;"> </span></span></div>
<div class="MsoListParagraphCxSpLast" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #333333; font-family: Arial; mso-fareast-font-family: Arial;">-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal;">
</span></span><!--[endif]--><span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"> <o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">All subsequent
changes to the properties of an entity will be stored over time (according to
event time), characteristic of a NoSQL data store behavior.</span><span style="font-size: 10pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">As
mentioned before, the Event server is the data store of the PIO framework. Not only can you easily import data into it
via a REST API, you can plug in any analytics tool to visualize,
interrogate and model that data for exploratory analytics purposes. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">This is
done via the </span><b><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">export</span></b><span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"> command, into a business analytics tool of your choice.
Let’s review a complete example of this. <o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<h2>
<span style="background: white;">An example</span></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">For our
example we will play around with a mock-up of</span> <a href="http://en.wikipedia.org/wiki/Customer_lifetime_value">customer lifetime
value</a> <span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">application
data: essentially data representing customer data and purchases in an online
e-commerce website, that we want to analyze to measure the value derived from
these customers over the their lifetime engagement with our business.</span><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h3>
Data model setup</h3>
<h2>
<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">We will
mock-up the data to be of the form:<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; margin-left: .25in; margin-right: 0in; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 0in 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><u><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">Orders<o:p></o:p></span></u></b></div>
<div class="MsoListParagraphCxSpFirst" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; margin-left: 0.25in; padding: 0in; text-indent: -0.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Arial; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Arial; padding: 0in;">-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal; line-height: normal;">
</span></span><!--[endif]--><b><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">Order</span></b><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">: customer’s
orders<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; margin-left: 0.25in; padding: 0in; text-indent: -0.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Arial; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Arial; padding: 0in;">-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal; line-height: normal;">
</span></span><!--[endif]--><b><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">Spend</span></b><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">:
$ amount for this order<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; margin-left: 0.25in; padding: 0in; text-indent: -0.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Arial; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Arial; padding: 0in;">-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal; line-height: normal;">
</span></span><!--[endif]--><b><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">City</span></b><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">:
name of the city of where the purchase was made<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; margin-left: 0.25in; padding: 0in; text-indent: -0.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Arial; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Arial; padding: 0in;">-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal; line-height: normal;">
</span></span><!--[endif]--><b><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">State</span></b><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">:
state where the purchase was made<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpLast" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; margin-left: 0.25in; padding: 0in; text-indent: -0.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Arial; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Arial; padding: 0in;">-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal; line-height: normal;">
</span></span><!--[endif]--><b><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">Customer</span></b><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">:
Customer who made that purchase (uniqueness enforced through customer id). This allows for a one-to-many
relationship between a customer and his/her purchases. <o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; margin-left: .25in; margin-right: 0in; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 0in 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><u><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">Customers<o:p></o:p></span></u></b></div>
<div class="MsoListParagraphCxSpFirst" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; margin-left: 0.25in; padding: 0in; text-indent: -0.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Arial; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Arial; padding: 0in;">-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal; line-height: normal;">
</span></span><!--[endif]--><b><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">Customer</span></b><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">:
customer id<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; margin-left: 0.25in; padding: 0in; text-indent: -0.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Arial; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Arial; padding: 0in;">-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal; line-height: normal;">
</span></span><!--[endif]--><b><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">Channel</span></b><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">: marketing
channel by which customer signed up<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpLast" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; margin-left: 0.25in; padding: 0in; text-indent: -0.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Arial; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Arial; padding: 0in;">-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal; line-height: normal;">
</span></span><!--[endif]--><b><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">Customer name</span></b><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">: name
of customer<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Remember
that this must translate into the PIO Event data structure that we talked about
above. So this will look like:<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; margin-left: .25in; margin-right: 0in; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 0in 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><u><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">Orders<o:p></o:p></span></u></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">{<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;"> “event” : “$set”,<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;"> “entityType” : “order”,<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;"> "entityId" : "<unique id>",<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;"> "properties"
: {<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;"> "spend"
: "<val>",<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;"> "city"
: "<string>",<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;"> "state"
: "<string>",<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;"> "store"
: "<string>"<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;"> “customer” :
<val>”<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;"> }<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; margin-left: .25in; margin-right: 0in; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 0in 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><u><span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">Customers<o:p></o:p></span></u></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">{<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;"> "event" : "$set",<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;"> "entityType" :
"customer",<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;"> "entityId" : "<unique
id>", <o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;"> "properties" : {<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;"> “DOB” : “<string>”,<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;"> "channel" : "<string>",<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;"> "name" :
"<string>"<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; mso-ascii-font-family: Cambria; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-hansi-font-family: Cambria; padding: 0in;">}<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">But first,
as described in the</span> <a href="http://docs.prediction.io/templates/recommendation/quickstart/">quickstart
guide</a><span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">, let’s
start our instance of the PIO Event server. An easy way to have this
automatically done for you is to use one of the pre-loaded</span> <a href="https://www.terminal.com/user/predictionio">images ready on Terminal.com</a>.<o:p></o:p></div>
<div class="MsoNormal">
<br />
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Let’s
first create a new app in which we will store our data points:</span><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$ <b>pio app new ordersApp</b><o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">[WARN] [NativeCodeLoader] Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">[INFO] [HBLEvents] The table predictionio_eventdata:events_8
doesn't exist yet. Creating now...<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">[INFO] [App$] Initialized Event Store for this app ID: 8.<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">[INFO] [App$] Created new app:<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">[INFO] [App$] Name:
ordersApp<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">[INFO] [App$] ID: 8<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">[INFO] [App$] Access Key:
nE9KITDzprLR6utwUJ9a4qDhscsKsjKFlXMcMsxVEdbkQjqYRm8pFcHHDdrM6Cid<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">vagrant@vagrant-ubuntu-trusty-64:~/ $<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><br />
Let’s insert a few data points for our example, via the HTTP REST API, using
the Access key that was passed to us:<o:p></o:p></span></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$ curl -i -X POST
http://localhost:7070/events.json?accessKey=nE9KITDzprLR6utwUJ9a4qDhscsKsjKFlXMcMsxVEdbkQjqYRm8pFcHHDdrM6Cid
-H "Content-Type: application/json" -d '{<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">>
"event" : "$set",<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">>
"entityType" : "order",<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">> "entityId"
: "3",<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">> <o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">>
"properties" : {<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">>
"spend" : "4.01",<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">>
"city" : "san francisco",<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">>
"state" : "CA",<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">>
"store" : "Men Apparel"<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">> “customer” : “1”<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">> },<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">> }'<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">HTTP/1.1 201 Created<o:p></o:p></span></b></div>
</div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$ <b> curl -i -X POST
http://localhost:7070/events.json?accessKey=
nE9KITDzprLR6utwUJ9a4qDhscsKsjKFlXMcMsxVEdbkQjqYRm8pFcHHDdrM6Cid -H
"Content-Type: application/json" -d '{<o:p></o:p></b></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">>
"event" : "$set",<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">>
"entityType" : "customer",<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">> "entityId"
: "1",<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">> <o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">>
"properties" : {<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">> "</span></b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">channel" : <b>"email",<o:p></o:p></b></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">>
“DOB” : “1/12/1970”,<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<b><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">> "name" :
"sam dolittle"<o:p></o:p></span></b></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">> },<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">> }'<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">HTTP/1.1 201 Created<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">Server: spray-can/1.3.2<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">Date: Thu, 12 Mar 2015 22:31:13 GMT<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">Content-Type: application/json; charset=UTF-8<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">Content-Length: 57<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<br /></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">We add a
few more data points in a similar way (not shown). In a real-life example, we
would probably be ingesting our already existing data in batch.</span> <a href="http://docs.prediction.io/datacollection/batchimport/">Here</a> <span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">is how to do this with the
PIO API.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">The data
is now stored in the Event server!<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<h3>
Data export</h3>
<h2>
<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">First,
let’s export the data from our apps.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$ bin/pio export --appid 8 --output /home/exportFinal3 --format
parquet<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Let’s
verify our export by firing a few parquet commands (Parquet can simply be </span><a href="http://parquet.incubator.apache.org/"><span style="background: white; font-family: Arial; font-size: 10.0pt; mso-fareast-font-family: "Times New Roman";">downloaded
to run this</span></a><span style="background: white; color: #222222; font-family: Arial; font-size: 10.0pt; mso-fareast-font-family: "Times New Roman";">) </span><span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">on the generated result:</span><span style="background: white; color: #222222; font-family: Arial; font-size: 10.0pt; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$ <b>hadoop parquet.tools.Main
cat part-r-1.parquet</b> <o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">entityId = 2 entityType = user event = $set eventId =
5iuLzCHXzzehq_R3hjsL1AAAAUvZAVo8gzcZ76dDAEs eventTime =
2015-03-02T05:41:59.484Z properties: .rating = 2.0 targetEntityId = 98
targetEntityType = item creationTime =
2015-03-04T05:29:51.278Z entityId = 1 entityType = order event = $set eventId =
78r6v2QT5GgWWrt_bD_q7wAAAUvjQvWunS1ONY2GyoM eventTime =
2015-03-04T05:29:51.278Z properties: .city = san jose .spend = 11.99 .state =
CA .store = Women Apparel .customer = 1<o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$ <b>parquet.tools.Main schema
part-r-1.parquet</b> <o:p></o:p></span></div>
<div class="MsoNormal" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: none; line-height: 17.4pt; padding: 0in; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">message root { optional
binary creationTime (UTF8); optional
binary entityId (UTF8); optional binary
entityType (UTF8); optional binary
event (UTF8); optional binary eventId
(UTF8); optional binary eventTime
(UTF8); optional group properties
{ optional binary city (UTF8); optional double rating; optional binary spend (UTF8); optional binary state (UTF8); optional binary store (UTF8); optional binary customer (UTF8) }
optional binary targetEntityId (UTF8);
optional binary targetEntityType (UTF8); } <o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Arial; mso-border-alt: none windowtext 0in; padding: 0in;">We end up with 2 sets of <i>Event</i> data, that we can now freely
explore in any BI tool. We will demonstrate this in the next section, and even
join these two datasets.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<h3>
<span style="border: none windowtext 1.0pt; mso-border-alt: none windowtext 0in; padding: 0in;">Data exploration</span></h3>
<div class="MsoNormal">
<br /></div>
<h4>
<span style="border: none windowtext 1.0pt; mso-border-alt: none windowtext 0in; padding: 0in;">Business Intelligence tool: iPython<o:p></o:p></span></h4>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"> Let’s follow the guide in the
documentation to start exploring this data. For our needs, we will use iPython,
although we could use a lot of other tools as well. A good guide about using
iPython is</span> <a href="http://blog.cloudera.com/blog/2014/08/how-to-use-ipython-notebook-with-apache-spark/">here</a>.<o:p></o:p></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">For your
convenience, here is a</span> <a href="https://www.terminal.com/tiny/Ll7tm5sD5M">terminal.com-powered
ipython image</a><span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"> that
has a Spark-enabled iPython image.</span><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h4>
Set-up of pySpark<o:p></o:p></h4>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"> We will use SQL code to interact
with our data, in a Data frame environment. So first let’s ensure that we can
use our Python code to talk to Spark/Spark SQL. Initial setup to talk to Spark:<o:p></o:p></span></div>
<div class="MsoNormal">
<!--[if gte vml 1]><v:shape
id="Picture_x0020_13" o:spid="_x0000_i1028" type="#_x0000_t75" style='width:6in;
height:168pt;visibility:visible;mso-wrap-style:square'>
<v:imagedata src="file://localhost/Users/mlieber/Library/Caches/TemporaryItems/msoclip/0/clip_image003.png"
o:title=""/>
</v:shape><![endif]--><!--[if !vml]--><!--[endif]--><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEig13AabHQ2EAs8PhNhRGM8dcMcgYRIb89pjhSjE7kfi4Po4RPNLe3MiUZS-QJ57OUseedJkbjp0SMNzN_dZsRTd_nFPUhyphenhyphen8YNfIqSH-FJlmyarCwPwSs6HSdBd6uK4uGSrFGA2FsqtylIM/s1600/notebook2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEig13AabHQ2EAs8PhNhRGM8dcMcgYRIb89pjhSjE7kfi4Po4RPNLe3MiUZS-QJ57OUseedJkbjp0SMNzN_dZsRTd_nFPUhyphenhyphen8YNfIqSH-FJlmyarCwPwSs6HSdBd6uK4uGSrFGA2FsqtylIM/s1600/notebook2.png" /></a></div>
<div class="MsoNormal">
<o:p><br /></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">We will
first perform some data explorations via Spark SQL run in</span> <a href="http://spark.apache.org/docs/latest/api/python/">Python pyspark mode</a>,
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">directly
in our iPython notebook.</span><o:p></o:p></div>
<div class="MsoNormal">
<span style="font-family: Cambria; font-size: 12.0pt; mso-ansi-language: EN-US; mso-ascii-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-language: AR-SA; mso-bidi-theme-font: minor-bidi; mso-fareast-font-family: "MS 明朝"; mso-fareast-language: EN-US; mso-fareast-theme-font: minor-fareast; mso-hansi-theme-font: minor-latin;"><br clear="all" style="mso-special-character: line-break; page-break-before: always;" />
</span>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h4>
SQL queries on our data<o:p></o:p></h4>
<div class="MsoNormal">
<br /></div>
<h5>
<u>Order table<o:p></o:p></u></h5>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"> Let’s first explore our <i>order</i> table. For this we will query our
exported <i>Events</i> Parquet data for the
entity Type ‘<i>order’</i> (conversely, look
for ‘<i>customer’</i> within <i>Events</i> for customer data):<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<!--[if gte vml 1]><v:shape
id="Picture_x0020_9" o:spid="_x0000_i1027" type="#_x0000_t75" style='width:6in;
height:240pt;visibility:visible;mso-wrap-style:square'>
<v:imagedata src="file://localhost/Users/mlieber/Library/Caches/TemporaryItems/msoclip/0/clip_image005.png"
o:title=""/>
</v:shape><![endif]--><!--[if !vml]--><!--[endif]--><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgaqY6V_iXYNi3_0qvT4xwGoYX_jjf0nAAgHUl1TyDu4waFKxJrGYYGCtVUGiVAwewJnoGiFMKcrDT7zTe5Xpp8tyqgJb9HcKAc9NGP3J0kOY4Jf6Fg5T5PxMwj3bK0VLjiAhChpChUMHXW/s1600/note3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgaqY6V_iXYNi3_0qvT4xwGoYX_jjf0nAAgHUl1TyDu4waFKxJrGYYGCtVUGiVAwewJnoGiFMKcrDT7zTe5Xpp8tyqgJb9HcKAc9NGP3J0kOY4Jf6Fg5T5PxMwj3bK0VLjiAhChpChUMHXW/s1600/note3.png" /></a></div>
<br />
<h5>
<u><span style="background: white;">Customer table<o:p></o:p></span></u></h5>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"> Let’s explore our <i>customer</i> table a slightly different way,
and create a Hive SQL table from our data this time, using the same filtering
clause mechanism on <i>Events</i> as
earlier:<o:p></o:p></span></div>
<div class="MsoNormal">
<!--[if gte vml 1]><v:shape
id="Picture_x0020_12" o:spid="_x0000_i1026" type="#_x0000_t75" style='width:6in;
height:249pt;visibility:visible;mso-wrap-style:square'>
<v:imagedata src="file://localhost/Users/mlieber/Library/Caches/TemporaryItems/msoclip/0/clip_image007.png"
o:title=""/>
</v:shape><![endif]--><!--[if !vml]--><!--[endif]--><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGF6pVzI_bjLKYbbw1JHzq15xzB7aucl4A9B2_oc5IzvOIKrelg17eQp0HVdPDzirCr1hbOcX1v7s1vvlL_T1FwcgJcreK0CtClZcc-qFjycSaprOC0P2Cbw1sFyphDhp4XbrUCv_B0vLe/s1600/note4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGF6pVzI_bjLKYbbw1JHzq15xzB7aucl4A9B2_oc5IzvOIKrelg17eQp0HVdPDzirCr1hbOcX1v7s1vvlL_T1FwcgJcreK0CtClZcc-qFjycSaprOC0P2Cbw1sFyphDhp4XbrUCv_B0vLe/s1600/note4.png" /></a></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><br /></span></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">We also
created an <i>order</i> table.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<h3>
<span style="background: white;">Customer lifetime value query</span></h3>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"> We can now join the two <i>Events</i> data, <i>order</i> and <i>customer</i>, to
get an overall picture of our customer lifetime value, and see which marketing
channel is more prevalent for example:<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; mso-no-proof: yes; padding: 0in;"><!--[if gte vml 1]><v:shape id="Picture_x0020_16"
o:spid="_x0000_i1025" type="#_x0000_t75" style='width:6in;height:280pt;
visibility:visible;mso-wrap-style:square'>
<v:imagedata src="file://localhost/Users/mlieber/Library/Caches/TemporaryItems/msoclip/0/clip_image009.png"
o:title=""/>
</v:shape><![endif]--><!--[if !vml]--><!--[endif]--></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheQimNYeX7mwTDOOyu5EuoQtOgURAuBvRRRbFk5qsXmXqrvUi2VUL478Rzjm4QwPzsE2QLX-pdZpz_H4IHC-2G1cl7l9FIH6Atf8DzaHnCHep0Iid2mJfSyv5_LUPHVJDj5UPVe9IiImB_/s1600/pio6.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheQimNYeX7mwTDOOyu5EuoQtOgURAuBvRRRbFk5qsXmXqrvUi2VUL478Rzjm4QwPzsE2QLX-pdZpz_H4IHC-2G1cl7l9FIH6Atf8DzaHnCHep0Iid2mJfSyv5_LUPHVJDj5UPVe9IiImB_/s1600/pio6.png" /></a></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><br /></span></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">It seems
like the answer is ‘email’!<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<h2>
<span style="border: none windowtext 1.0pt; mso-border-alt: none windowtext 0in; padding: 0in;">Summary<o:p></o:p></span></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">In this
post, we have discovered how the PIO Event server can be a data repository of
choice for event data. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">We then
exported that data and explored it further by way of a simple Customer lifetime
value example dataset via open sources tools like Spark, Python and iPython
notebooks.<o:p></o:p></span></div>
<div class="MsoNormal">
<!--EndFragment--></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Hope it
was fun! <o:p></o:p></span></div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com0tag:blogger.com,1999:blog-170648781806274754.post-33763021710796833902015-05-18T09:28:00.000-07:002015-05-18T10:03:37.275-07:00Internet of Things conference 2015 - hackaton<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhtDwPv05lzo2wjTW85ZJJYerU28OYbNJLGkLFXPAglX3yfOSaf0UyQ7swMpl47Ndu95cZthj3HpKwr82WgRZ_0Z9Nak0N7fNL1B2VpAlyoSRh__5iYFyLXBKw2lIA8NkiVU9op2v_W3uSv/s1600/iot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><br class="Apple-interchange-newline" /><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhtDwPv05lzo2wjTW85ZJJYerU28OYbNJLGkLFXPAglX3yfOSaf0UyQ7swMpl47Ndu95cZthj3HpKwr82WgRZ_0Z9Nak0N7fNL1B2VpAlyoSRh__5iYFyLXBKw2lIA8NkiVU9op2v_W3uSv/s1600/iot.png" /></a></div>
<br />
My friend Julien and I presented a mini-project at the Internet of Things 2015 conference in San Francisco, as part of the <a href="http://iotworldevent.com/hack/">hackaton</a>. Here are the <a href="https://www.slideshare.net/slideshow/embed_code/key/BZqYUGAcYaMNtq">Slides</a> as well as part of <a href="https://github.com/mlieber/code">the code</a> .<br />
<br />
<b>Project summary</b><br />
<br />
Essentially the project was about designing a "smart thrash" that gives an alert (email, SMS) when something other than what the bin is supposed to contain is being dumped - i.e., plastic in a a compostable bin that should only contain food.<br />
<br />
<br />
<b>Architecture</b><br />
<br />
Technically, the project consists of :<br />
<br />
- a <a href="https://www.tessel.io/">Tessel</a> microcontroller<br />
- a Tessel mini-camera sensor<br />
- an AI tagging API, from <a href="https://www.tessel.io/">AlchemyAPI</a><br />
- code in JS/Node.js<br />
<br />
This project attempts to demonstrate the easiness with which different API, even hardware, can be connected together in a seamless way to quickly build prototypes.<br />
<div>
<br /></div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com1tag:blogger.com,1999:blog-170648781806274754.post-82025798639930204532015-03-31T14:23:00.000-07:002015-03-31T14:23:08.869-07:00How is fault tolerance handled in Spark streaming? An overview<div style="background-color: white; color: #282828;">
<div class="MsoNormal" style="background: white;">
<span style="color: #282828; font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"><br />
</span><span style="color: black; font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";">Trying to get my head around fault tolerance in
Spark streaming, and in light of the <a href="https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html">recent
changes made to it</a>, below is my high level understanding of it, based on
conversations with a colleague.</span><span style="color: #282828; font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"><o:p></o:p></span></div>
<div class="MsoNormal" style="background: white;">
<br /></div>
<div class="MsoNormal" style="background: white;">
<span style="color: black; font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";">First,
the basics:</span><span style="color: black; font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"><o:p></o:p></span></div>
<h1>
Spark Streaming components<o:p></o:p></h1>
<div class="MsoNormal" style="background: white;">
<br /></div>
<h2>
Data model<span style="font-family: Times;"><o:p></o:p></span></h2>
<div class="MsoNormal" style="background: white;">
<br /></div>
<div class="MsoNormal" style="background: white;">
<span style="color: black; font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"><span style="mso-tab-count: 1;"> </span>All data is modeled as RDDs, built by
design with lineage of deterministic operations, i.e. any re-computation always
leads to the same result. Essentially the same process (however with a
different mechanism) as in Hadoop's fault-tolerance for slave failures.</span><span style="color: black; font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"><o:p></o:p></span></div>
<div class="MsoNormal" style="background: white;">
<br /></div>
<ul type="disc">
<li class="MsoNormal" style="color: black; mso-list: l0 level1 lfo2; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; tab-stops: list .5in;"><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";">An <b>RDD</b> is
an immutable, deterministically re-computable, distributed dataset in
Spark.</span><span style="font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"><o:p></o:p></span></li>
<li class="MsoNormal" style="color: black; mso-list: l0 level1 lfo2; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; tab-stops: list .5in;"><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";">A <b>DStream</b> is
an abstraction used in Spark streaming over RDDs, which is
essentially a stream of RDDs. A lot of the same APIs apply over DStreams.</span><span style="font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"><o:p></o:p></span></li>
</ul>
<h2>
<br />
Types of nodes<span style="font-family: Times;"><o:p></o:p></span></h2>
<ul type="disc">
<li class="MsoNormal" style="color: black; mso-list: l0 level1 lfo2; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; tab-stops: list .5in;"><b><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";">Worker</span></b><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"> node:
slave nodes, running the application code on the cluster</span><span style="font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"><o:p></o:p></span></li>
<li class="MsoNormal" style="color: black; mso-list: l0 level1 lfo2; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; tab-stops: list .5in;"><b><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";">Driver</span></b><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"> node:
main program of the application. Similar to Application master in the
Hadoop YARN world, the Driver owns the Spark context, hence all the state
of application. </span><span style="font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"><o:p></o:p></span></li>
</ul>
<h2>
<br />
Main components in a streaming application<span style="font-family: Times;"><o:p></o:p></span></h2>
<ul type="disc">
<li class="MsoNormal" style="color: black; mso-list: l0 level1 lfo2; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; tab-stops: list .5in;"><b><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";">Driver: </span></b><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-bidi-font-weight: bold;">akin to the master node in a Storm application
from a conceptual point of view.</span><span style="font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"><o:p></o:p></span></li>
<li class="MsoNormal" style="color: black; mso-list: l0 level1 lfo2; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; tab-stops: list .5in;"><b><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";">Receiver:</span></b><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"> the
Receiver, living in a worker node, is similar to a spout in Apache Storm,
and consumes the data from source; there are <a href="http://spark.apache.org/docs/latest/streaming-programming-guide.html#receiver-reliability">already
built-in receivers OOTB</a> for the common ones.</span><span style="font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"><o:p></o:p></span></li>
<li class="MsoNormal" style="color: black; mso-list: l0 level1 lfo2; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; tab-stops: list .5in;"><b><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";">Executor:</span></b><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"> this
processes the data; similar to a bolt in Apache Storm from a conceptual
point of view.</span><span style="font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"><o:p></o:p></span></li>
</ul>
<div class="MsoNormal" style="background: white;">
<br /></div>
<h1>
Main steps in a Streaming application<span style="font-family: Times;"><o:p></o:p></span></h1>
<div class="MsoNormal" style="background: white;">
<br /></div>
<div class="MsoNormal" style="background: white;">
<span style="color: black; font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"><span style="mso-tab-count: 1;"> </span>There are essentially three steps in a
streaming application, so understanding the record processing guarantees (<i style="mso-bidi-font-style: normal;">at least once</i>, <i style="mso-bidi-font-style: normal;">at most once </i>or <i style="mso-bidi-font-style: normal;">exactly-once</i>
semantics) at each step is essential:</span><span style="color: black; font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman";"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-left: .5in; mso-list: l1 level1 lfo1; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; tab-stops: list .5in; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #282828; font-family: Times; font-size: 13.5pt; mso-bidi-font-family: Times; mso-fareast-font-family: Times;"><span style="mso-list: Ignore;">1.<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><u><span style="color: #282828; font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Receiving the streaming
data</span></u><span style="color: #282828; font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
<ul type="disc">
<li class="MsoNormal" style="mso-list: l0 level1 lfo2; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; tab-stops: list .5in;"><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Depending on the kind of input source, at this step reliable
vs. unreliable receivers are used; e.g. a stream from a file (local or
Hdfs) is reliable, a Kafka stream is reliable, but data directly from a
socket connection is unreliable. <o:p></o:p></span></li>
<li class="MsoNormal" style="color: #282828; mso-list: l0 level1 lfo2; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; tab-stops: list .5in;"><span style="color: windowtext; font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">In Spark
streaming when the data is received from any receiver, it is by default
replicated (in memory) to two worker nodes, after which if the receiver
was reliable,</span><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">
<a href="http://spark.apache.org/docs/latest/streaming-custom-receivers.html">the
acknowledgement is sent.</a></span><span style="font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"> </span><span style="color: windowtext; font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">In</span><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">
case of an unreliable receiver, the data is lost (i.e. <i style="mso-bidi-font-style: normal;">at least once</i> scenario).</span><span style="font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></li>
<li class="MsoNormal" style="color: #282828; mso-list: l0 level1 lfo2; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; tab-stops: list .5in;"><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">In the event of failure of the Driver
node, the Spark context is lost and hence all the past data. The initial
remedy is a mechanism of a Spark WAL (write ahead logs), but the cleaner
way, and if the data sender allows for it, is to simply re-use and <a href="https://databricks.com/blog/2015/03/30/improvements-to-kafka-integration-of-spark-streaming.html?utm_content=13373883&utm_medium=social&utm_source=twitter">consume
their WAL instead.</a></span><span style="font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></li>
</ul>
<div class="MsoNormal" style="mso-margin-bottom-alt: auto; mso-margin-top-alt: auto;">
<br /></div>
<div class="MsoListParagraph" style="mso-add-space: auto; mso-list: l1 level1 lfo1; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; tab-stops: list .5in; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #282828; font-family: Times; font-size: 13.5pt; mso-bidi-font-family: Times; mso-fareast-font-family: Times;"><span style="mso-list: Ignore;">2.<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><u><span style="color: #282828; font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Transform the data</span></u><span style="color: #282828; font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
<ul type="disc">
<li class="MsoNormal" style="color: #282828; mso-list: l0 level1 lfo2; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; tab-stops: list .5in;"><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">At this stage we have a guarantee
of <i style="mso-bidi-font-style: normal;">exactly once</i> semantics due to
the underlying RDD guarantees; i.e.</span><span style="font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"> </span><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">in
case of a worker node failure, the transformation gets computed on other
node where the data is replicated.</span><span style="font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></li>
</ul>
<div class="MsoNormal" style="mso-margin-bottom-alt: auto; mso-margin-top-alt: auto;">
<br /></div>
<div class="MsoListParagraph" style="mso-add-space: auto; mso-list: l1 level1 lfo1; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; tab-stops: list .5in; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #282828; font-family: Arial; font-size: 13.5pt; mso-fareast-font-family: Arial;"><span style="mso-list: Ignore;">3.<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><u><span style="color: #282828; font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Output the transformed
data<o:p></o:p></span></u></div>
<ul type="disc">
<li class="MsoNormal" style="color: #282828; mso-list: l0 level1 lfo2; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; tab-stops: list .5in;"><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Output operations have <i>at
least once</i> semantics, that is, the transformed data may get
written to an external entity more than once in the event of a worker
failure. Additional effort may be necessary to achieve <i style="mso-bidi-font-style: normal;">exactly-once</i> semantics. There are
two approaches.</span><span style="font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></li>
<li class="MsoNormal" style="color: #282828; mso-list: l0 level1 lfo2; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; tab-stops: list .5in;"><b style="mso-bidi-font-weight: normal;"><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-bidi-font-style: italic; mso-fareast-font-family: "Times New Roman";">Idempotent updates</span></b><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">: Multiple attempts always write the same data. </span><span style="font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></li>
<li class="MsoNormal" style="color: #282828; mso-list: l0 level1 lfo2; mso-margin-bottom-alt: auto; mso-margin-top-alt: auto; tab-stops: list .5in;"><b style="mso-bidi-font-weight: normal;"><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-bidi-font-style: italic; mso-fareast-font-family: "Times New Roman";">Transactional updates</span></b><span style="font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">: All updates are made atomically so that updates are
made exactly once. </span><span style="font-family: Times; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></li>
</ul>
<h1>
Example<o:p></o:p></h1>
<h1>
<span style="color: #282828; font-family: Arial; font-size: 13.5pt; font-weight: normal; mso-bidi-font-family: "Times New Roman"; mso-bidi-font-weight: bold; mso-fareast-font-family: "Times New Roman";">Lets say there is a batch of
events, and one of the operations is maintaining ‘global count’, such that it
keeps a counter of total events streamed so far. Consider that when the batch
of events is being processed, mid-way during the processing the node that was
processing goes down. What happens now:<o:p></o:p></span></h1>
<h1>
<span style="color: #282828; font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">Is the
global count reflecting the ‘half way events’ processed? </span><span style="font-family: Times;"><o:p></o:p></span></h1>
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>698</o:Words>
<o:Characters>3979</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>33</o:Lines>
<o:Paragraphs>9</o:Paragraphs>
<o:CharactersWithSpaces>4668</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal" style="mso-margin-bottom-alt: auto; mso-margin-top-alt: auto;">
<span style="color: #282828; font-family: Arial; font-size: 13.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">If strictly speaking of
global count, there is <a href="https://spark.apache.org/docs/latest/programming-guide.html#accumulators">built-in
global counter</a> available in Spark which takes care of this problem. But as
this is just an example and for all other situations except counter, the
lineage of transformation applied on the whole batch of data will remedy this.
As mentioned, RDD transformations are deterministically re-computable, which
means the re-computation will give the same resultant state. However if the
result also needs to be stored externally, that logic needs to be handled independently.<o:p></o:p></span></div>
</div>
<ol style="color: #282828;">
</ol>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com2tag:blogger.com,1999:blog-170648781806274754.post-6487580644121131362015-03-02T16:28:00.000-08:002015-03-23T18:57:47.438-07:00Converting Avro data to Parquet format in Hadoop<div class="MsoNormal">
<b>Update: this post is now part of the Cloudera blog, found at <a href="http://ow.ly/KAKmz">ow.ly/KAKmz</a></b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
A customer of mine wants to take advantage of both worlds:
work with his existing Apache Avro data, with all of the advantages that it
confers, but take advantage of the predicate push-down features that Parquet
provides. How to reconcile the two?<o:p></o:p></div>
<div class="MsoNormal">
For more information about combining these formats, see <a href="http://grepalex.com/2014/05/13/parquet-file-format-and-object-model/">this</a>.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
For a quick recap on Avro, see my previous <a href="http://matthieulieber.blogspot.com/2015/01/how-to-load-some-avro-data-into-spark.html">post</a>.
While you are at it, see why Apache Avro is currently the <a href="http://blog.confluent.io/2015/02/25/stream-data-platform-2/">gold
standard in the industry</a>.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
What we are going to demonstrate here: how to take advantage
of existing tools to convert our existing Avro format into Parquet, and make
sure we can query that transformed data. <o:p></o:p></div>
<div class="MsoNormal" style="mso-margin-bottom-alt: auto; mso-margin-top-alt: auto;">
<br /></div>
<h1>
Parquet data<o:p></o:p></h1>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
First let’s try to convert text data to Parquet, and read it
back. Fortunately there is some code already from Cloudera to <a href="https://github.com/cloudera/parquet-examples">do this in Map Reduce</a>. </div>
<div class="MsoNormal">
The code from Cloudera: <a href="https://github.com/cloudera/parquet-examples">https://github.com/cloudera/parquet-examples</a>
, and doc <a href="http://www.cloudera.com/content/cloudera/en/documentation/cdh4/latest/CDH4-Installation-Guide/cdh4ig_parquet.html">here</a>
lets you read and write Parquet data. Let’s try this.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
First, let’s create some Parquet data as input. We will use
Hive for this, by directly converting Text data into Parquet.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h2>
Parquet conversion<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
1. Let’s create a csv data example, and create a text table
(here, just 2 columns of integers) in HDFS pointing to it:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">create table mycsvtable (x int, y int)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">row format delimited<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">FIELDS TERMINATED BY ','<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">STORED AS TEXTFILE;<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">LOAD DATA LOCAL INPATH '/home/cloudera/test/' OVERWRITE INTO TABLE
mycsvtable; <o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
2. Create a Parquet table in Hive, and convert the data to
it:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">create table myparquettable (a INT, b INT)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">STORED AS PARQUET<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">LOCATION '/tmp/data';<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">insert overwrite table myparquettable select * from mycsvtable;<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<br />
<div class="MsoListParagraph" style="margin-left: .25in; mso-add-space: auto; mso-list: l0 level1 lfo2; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-bidi-font-family: Cambria; mso-bidi-theme-font: minor-latin; mso-fareast-font-family: Cambria; mso-fareast-theme-font: minor-latin;"><span style="mso-list: Ignore;">3.<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->You will need to add Hadoop and Parquet
libraries relevant to the project in say, Eclipse for the code needed to be
built; therefore, all of the links to the proper libs needed to be added. We
then export the code as a JAR (File->Export as Running Jar) and run it
outside of Eclipse (otherwise, some Hadoop security issues ensue that prevent
you to run the code).<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoListParagraph" style="margin-left: .25in; mso-add-space: auto; mso-list: l0 level1 lfo2; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-bidi-font-family: Cambria; mso-bidi-theme-font: minor-latin; mso-fareast-font-family: Cambria; mso-fareast-theme-font: minor-latin;"><span style="mso-list: Ignore;">4.<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="mso-spacerun: yes;"> </span>Run the
program (you could also run java instead of Hadoop if you copy the data from hdfs
to local disk). The arguments are: inputData as Parquet / outputData as csv. We
just want to ensure that we can read the Parquet data and display it.<o:p></o:p></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$ sudo hadoop -jar ./testparquet.jar hdfs:///home/cloudera/test/data/000000_0
hdfs:///home/cloudera/test/dataparquet <o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-spacerun: yes;"> </span>See result: (csv
file):<o:p></o:p></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$ more test/dataparquet2/part-m-00000<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>1,2 3,4 5,6 <o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h1>
Avro data conversion<o:p></o:p></h1>
<h2>
Avro data example<o:p></o:p></h2>
<div class="MsoNormal" style="mso-margin-bottom-alt: auto; mso-margin-top-alt: auto;">
Let’s
get some Avro data example working, from <a href="http://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/">this
post</a>.<o:p></o:p></div>
<h2>
Avro data generation<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Interestingly Hive doesn’t let you load/convert csv data
into Avro like we did in the Parquet example. <span style="mso-spacerun: yes;"> </span><o:p></o:p></div>
<div class="MsoNormal">
Let’s walk through an example of creating an Avro schema
with its IDL, and generating some data. Let’s use <a href="http://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/">this
example</a> , with this twitter.avsc schema:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">{<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span><span style="mso-spacerun: yes;"> </span>"type" : "record",<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>"name" :
"twitter_schema",<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>"namespace" :
"com.miguno.avro",<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>"fields" : [ <o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">{<span style="mso-spacerun: yes;"> </span>"name" :
"username",<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>"type" :
"string",<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>"doc"<span style="mso-spacerun: yes;"> </span>: "Name of the user account on
Twitter.com"<span style="mso-spacerun: yes;"> </span>},<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>{<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>"name" :
"tweet",<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>"type" :
"string",<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>"doc"<span style="mso-spacerun: yes;"> </span>: "The content of the user's Twitter
message"<span style="mso-spacerun: yes;"> </span>},<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>{<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>"name" :
"timestamp",<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>"type" :
"long",<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>"doc"<span style="mso-spacerun: yes;"> </span>: "Unix epoch time in seconds"<span style="mso-spacerun: yes;"> </span>} ],<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>"doc:" :
"A basic schema for storing Twitter messages" }<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
and some data in twitter.json:<span style="mso-tab-count: 1;"> </span><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">{"username":"miguno","tweet":"Rock:
Nerf paper, scissors is fine.","timestamp": 1366150681 }
{"username":"BlizzardCS","tweet":"Works as
intended.<span style="mso-spacerun: yes;"> </span>Terran is
IMBA.","timestamp": 1366154481 }<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
We will convert the data (in Json) into binary Avro format.<o:p></o:p></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$ java -jar ~/avro-tools-1.7.7.jar fromjson --schema-file
twitter.avsc twitter.json > twitter.avro </span><span style="color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier;"><o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<h2>
Transformation from Avro to Parquet storage
format<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
So essentially use the best of both worlds: take advantage
of the Avro object model and serialization format of Avro, and combine it with
the columnar storage format of Parquet.<span style="font-family: Times; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman";"><br />
</span><span style="mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">First we will reuse our Avro data that was created earlier.</span><o:p></o:p></div>
<div class="MsoNormal">
<span style="mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><br /></span></div>
<div class="MsoNormal" style="mso-margin-bottom-alt: auto; mso-margin-top-alt: auto;">
<span style="mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">1.
We will then take advantage of this code: <a href="https://github.com/laserson/avro2parquet">https://github.com/laserson/avro2parquet</a>
to convert the Avro data to Parquet data. This is a map-only job that simply
sets up the right input and output format according to what we want. <o:p></o:p></span></div>
<div class="MsoNormal" style="mso-margin-bottom-alt: auto; mso-margin-top-alt: auto;">
<span style="mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><br /></span></div>
<div class="MsoNormal" style="mso-margin-bottom-alt: auto; mso-margin-top-alt: auto;">
2. After compilation, let’s run the
script on our existing Avro data:<span style="font-family: Times; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman";"><o:p></o:p></span></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$<span style="mso-spacerun: yes;"> </span>hadoop jar
avro2parquet.jar hdfs:///user/cloudera/twitter.avsc</span><span style="mso-spacerun: yes;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">hdfs:///user/cloudera/inputdir hdfs:///user/cloudera/outputdir<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
We get:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$ hadoop fs -ls /user/cloudera/outputdir<br />
<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">Found 3 items<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">-rw-r--r--<span style="mso-spacerun: yes;"> </span><o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">/user/cloudera/outputdir2/_SUCCESS<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">-rw-r--r--<span style="mso-spacerun: yes;"> </span>1 cloudera
cloudera<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">/user/cloudera/outputdir2/_metadata<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">-rw-r--r--<span style="mso-spacerun: yes;"> </span>1 cloudera
cloudera<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">/user/cloudera/outputdir2/part-m-00000.snappy.parquet<br />
<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Note that the Avro schema is converted directly to a
Parquet-compatible format.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
3. Now let’s test our result in Hive. We first create a
Parquet table (note the simple syntax in Hive 0.14+), then point to the data we
just created via a LOAD command, and finally query our converted data directly.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">hive> <b style="mso-bidi-font-weight: normal;">create table
tweets_parquet (username string, tweet string, timestamp bigint)<span style="mso-spacerun: yes;">
</span>STORED AS PARQUET;</b> <o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">OK<o:p></o:p></span></div>
</div>
<div style="background: white; border: solid #E7DEC3 1.0pt; margin-left: .25in; margin-right: 0in; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoListParagraphCxSpFirst" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal; font-style: normal; font-variant: normal; line-height: normal;"><b> </b></span></span></span><!--[endif]--><b style="mso-bidi-font-weight: normal;"><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">load data inpath
'/user/cloudera/outputdir/part-m-00000.snappy.parquet' overwrite into table
tweets_parquet;<br />
<o:p></o:p></span></b></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">Loading data to table default.tweets_parquet<br />
chgrp: changing ownership of
'hdfs://quickstart.cloudera:8020/user/hive/warehouse/tweets_parquet/part-m-00000.snappy.parquet':
User does not belong to hive<br />
Table default.tweets_parquet stats: [numFiles=1, numRows=0, totalSize=1075, rawDataSize=0]<br />
OK<br />
Time taken: 6.712 seconds<br />
hive<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><o:p> </o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span><b style="mso-bidi-font-weight: normal;">select * from tweets_parquet;<o:p></o:p></b></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">OK<br />
<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpLast" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="font-family: Wingdings; mso-bidi-font-family: Wingdings; mso-fareast-font-family: Wingdings;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span><br />
miguno<span style="mso-tab-count: 1;"> </span>Rock:
Nerf paper, scissors is fine.<span style="mso-tab-count: 1;"> </span>1366150681<br />
BlizzardCS<span style="mso-tab-count: 1;"> </span>Works as intended.<span style="mso-spacerun: yes;"> </span>Terran is IMBA.<span style="mso-tab-count: 1;"> </span>1366154481<br />
Time taken: 1.107 seconds, Fetched: 2 row(s)<br />
</span>Parquet
with Avro<o:p></o:p></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Let’s see verify our Parquet schema now that it is
converted; note that the schema still refers to Avro:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; margin-left: .25in; margin-right: 0in; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoListParagraphCxSpFirst" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><b style="mso-bidi-font-weight: normal;"><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><o:p> </o:p></span></b></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><b style="mso-bidi-font-weight: normal;"><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$ hadoop parquet.tools.Main
schema outputdir/part-m-00000.snappy.parquet<o:p></o:p></span></b></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><b style="mso-bidi-font-weight: normal;"><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><o:p> </o:p></span></b></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">message com.miguno.avro.Tweet {<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>required binary username
(UTF8);<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>required binary tweet
(UTF8);<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>required int64
timestamp;<br />
}<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><b style="mso-bidi-font-weight: normal;"><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><br /></span></b></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><b style="mso-bidi-font-weight: normal;"><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><o:p> </o:p></span></b></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><b style="mso-bidi-font-weight: normal;"><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><o:p> </o:p></span></b></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><b style="mso-bidi-font-weight: normal;"><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$ hadoop parquet.tools.Main
meta outputdir/part-m-00000.snappy.parquet<o:p></o:p></span></b></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">creator:<span style="mso-spacerun: yes;"> </span>parquet-mr <br />
extra:<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>avro.schema =
{"type":"record","name":"Tweet","namespace"
<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">file schema: com.miguno.avro.Tweet<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span><br />
------------------------------------------------------<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">username:<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>REQUIRED BINARY O:UTF8
R:0 D:0<br />
tweet:<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>REQUIRED BINARY O:UTF8
R:0 D:0<br />
timestamp:<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span><span style="mso-spacerun: yes;"> </span>REQUIRED INT64 R:0 D:0<br />
<br />
row group 1: RC:2
TS:297<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">---------------------------------------------------------username:<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>BINARY SNAPPY DO:0
FPO:4 SZ:67/65/0.97 VC:2 ENC:PLAIN,BIT_PACKED<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">tweet:<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>BINARY SNAPPY DO:0
FPO:71 SZ:176/175/0.99 VC:2 ENC:PLAIN<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">timestamp:<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>INT64 SNAPPY DO:0
FPO:247 SZ:59/57/0.97 VC:2 ENC:PLAIN,BIT_PACKED<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><o:p> </o:p></span></div>
<div class="MsoListParagraphCxSpLast" style="background: white; border: none; line-height: 17.4pt; margin-left: .25in; mso-add-space: auto; mso-border-alt: solid #E7DEC3 .75pt; mso-list: l1 level1 lfo1; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; text-indent: -.25in; vertical-align: baseline;">
<!--[if !supportLists]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Wingdings; font-size: 10.0pt; mso-bidi-font-family: Wingdings; mso-border-alt: none windowtext 0in; mso-fareast-font-family: Wingdings; padding: 0in;"><span style="mso-list: Ignore;">Ø<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>1139</o:Words>
<o:Characters>6494</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>54</o:Lines>
<o:Paragraphs>15</o:Paragraphs>
<o:CharactersWithSpaces>7618</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="0" Name="Body Text"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1027"/>
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1"/>
</o:shapelayout></xml><![endif]-->
<!--StartFragment-->
<!--EndFragment--></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<o:p>That concludes our exercise! Let me know if additional questions.</o:p></div>
<div class="MsoNormal">
<o:p><br /></o:p></div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com0tag:blogger.com,1999:blog-170648781806274754.post-65322154125265122972015-02-26T21:08:00.001-08:002015-02-26T21:08:51.529-08:00Hadoop world / Strata 2015 overview<h2>
A few notes about Strata 2015</h2>
I have been going to Strata for a few years now; so I am pretty familiar with the Hadoop vendors and offerings that are shown. Here are a few general thoughts about the event and what I've noted.<br />
<br />
<h3>
Overview</h3>
A lot more companies/players in the Big Data space in general. Of note, in addition to the "regulars", there are a lot more niche players, and a few behemoths (HP, Intel, Microsoft) trying to capitalize on Hadoop.<br />
<br />
<h3>
Trending this year</h3>
<h4>
Big data in the cloud</h4>
A few companies now offer Hadoop-as-a-service (as well as other frameworks) in the cloud, in addition to IT or application-level features: Altiscale, Qubole, Datameer, etc. Apparently they are all mostly doing good, and there is enough space to accommodate everyone. Heard Qubole in particular is doing good.<br />
<br />
<h4>
Separation of concerns/Specialization of Hadoop tools</h4>
It seems like vendors offer either a one-stop shop to Hadoop, like Business Intelligence/Analytics tools (Platfora, Pentaho, etc) with the standard advantages and shortcomings that an off-the-shelf product may imply, or very specialized tools, like data discovery (Tamr), data cleansing (Paxata) or visualization (Zoomdata). Pick your weapon!<br />
Of note: why was Google not there?<br />
<h4>
Stream processing</h4>
More interestingly, batch analytics is becoming commoditized, with a number of tools available to perform these kind of processes. A newer type of application that is proposed is the kind that offers NRT stream processing. Data Torrent, RapidMinder, and especially Interana are amongst these companies. This to counteract the fact that open source tools like Storm and Spark Streaming are not for the faint of heart to implement..<br />
<br />
<h4>
Data discovery</h4>
This is a new offering among startups: the ability to auto-discover your sources of data and manage them automatically; what used to be called MDM and CDC, essentially in the "old" datawarehouse world, and that is partially solved via tools like Apache Falcon in the downstream ecosystem of tools. See <a href="http://matthieulieber.blogspot.com/2014/11/productionalization-of-hadoop.html">my post</a> on this.<br />
Instead, these companies (Tamr, Alation, Attivio) offer the ability to expose your data, expose their relationships, all of this by a combination of automation and machine learning tools.<br />
<br />
<h4>
Data Science/Machine Learning</h4>
I was stunned by the proliferation of startups around data science: H20, Dato, Prediction.io, Skytree, Dataiku, etc. It seems like there is a lot of redundancy in the space. One company seemingly out of the pack: DataRobot, which apparently won some Kaggle competition.<br />
<br />
Of note, but you knew that already: Spark is omnipresent.<br />
<br />
<br />
<h4>
My personal Awards</h4>
<br />
Best T-shirt: Datameer, Databricks<br />
Best toys: DataRobot<br />
Biggest booth for the smallest funding in a company: Tamr<br />
<br />
<br />
<br />Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com3tag:blogger.com,1999:blog-170648781806274754.post-19665519219757160172015-02-17T11:47:00.000-08:002015-02-17T12:11:02.108-08:00Introduction to Prediction.IO, an open-source Machine Learning framework<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div style="border-bottom: solid #4F81BD 1.0pt; border: none; mso-border-bottom-themecolor: accent1; mso-element: para-border-div; padding: 0in 0in 4.0pt 0in;">
<div class="MsoTitle">
<span style="font-size: x-large;"> What is Prediction.IO in a nutshell?</span></div>
</div>
<h1>
<span style="font-size: x-large;"><o:p></o:p></span></h1>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-tab-count: 1;"> </span>Building
machine learning an application from scratch is hard; you need to have the
ability to work with your own data and train your algorithm with it, build a
layer to serve the prediction results, manage the different algorithms you are
running, their evaluations, deploy your application in production, manage the
dependencies with your other tools, etc. <o:p></o:p></div>
<div class="MsoNormal">
Prediction.io is an open source Machine Learning server that
addresses these concerns. It aims to be the “LAMP stack” for data analytics.<span style="font-family: Times; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<h1>
Current state of Machine Learning frameworks<o:p></o:p></h1>
<div class="MsoNormal">
<span style="mso-tab-count: 1;"> </span><o:p></o:p></div>
<div class="MsoNormal">
<span style="mso-tab-count: 1;"> </span>Lets first
review some of the tools that are popular currently in the Machine Learning
(ML) community. Some widely used tools are: Mahout in the Hadoop ecosystem,
MLLib in the Spark community, H2o, DeepLearning4j.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
These APIs generally work great and provide implementations
of the main ML algorithms. However, what is missing from a general standpoint
in order to use them in a Production environment?<o:p></o:p></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->An integration layer to bring your data sources<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->A framework to roll a prototype into production<o:p></o:p></div>
<div class="MsoListParagraphCxSpLast" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->A simple API to query the results<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h2>
Example<o:p></o:p></h2>
<div class="MsoNormal">
<span style="mso-tab-count: 1;"> </span>Let’s take
a classic recommender as an example; usually predictive modeling is based on
users’ behaviors to predict product recommendations. <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
We will convert the data (in Json) into binary Avro format.<o:p></o:p></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">// Read training data <o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">val trainingData =
sc.textFile(“trainingData.txt”).map(_.split(‘,’) match {..})<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
which yields something like:<o:p></o:p></div>
<div class="MsoNormal">
user1 purchases product1, product2<o:p></o:p></div>
<div class="MsoNormal">
user2 purchases product2<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Then build a predictive model with an algorithm:<o:p></o:p></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">// collaborative filtering algorithm<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">val </span>model = ALS.train(trainingData, 10, 20, 0.01)<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Then start using the model:<o:p></o:p></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">// collaborative filtering algorithm<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
allUsers.foreach {
user => model.recommendProducts(user, 5) }<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This recommends 5 products for each user.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This code will work in development environment, but wouldn’t
work in production. Why?<br />
- How do you integrate with your existing data?<o:p></o:p></div>
<div class="MsoNormal">
- How do you unify the data from multiple sources?<o:p></o:p></div>
<div class="MsoNormal">
- How to deploy a scalable service that responds to dynamic
prediction query?<o:p></o:p></div>
<div class="MsoNormal">
- How do you persist the predictive model, in a distributed
environment?<o:p></o:p></div>
<div class="MsoNormal">
- How to make your storage layer, Spark, and the algorithms
talk to each other?<o:p></o:p></div>
<div class="MsoNormal">
- How to prepare the data for model training?<o:p></o:p></div>
<div class="MsoNormal">
- How to update the model with new data, without downtime?<o:p></o:p></div>
<div class="MsoNormal">
- Where does the business logic get added?<o:p></o:p></div>
<div class="MsoNormal">
- How to make the code configurable, reusable and manageable?<o:p></o:p></div>
<div class="MsoNormal">
- How do we build these with separation of concern (SOC),
like the web development side of things?<o:p></o:p></div>
<div class="MsoNormal">
- How to make things work in a real time environment?<o:p></o:p></div>
<div class="MsoNormal">
- How do I customize the recommender on a per-location
basis? How to discard data that is out of inventory?<o:p></o:p></div>
<div class="MsoNormal">
- How about performing different tests on the algorithms you
selected? <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h1>
Prediction IO to the rescue! <o:p></o:p></h1>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Let’s address the above questions.<o:p></o:p></div>
<div class="MsoNormal">
Prediction.io boasts an event server for storage, that
collects data (say, from a mobile app, web, etc) <b style="mso-bidi-font-weight: normal;">in a unified way</b>, from multiple channels.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
You can plug <b style="mso-bidi-font-weight: normal;">multiple
engines</b> within Prediction.io; each engine represents a type of prediction
problem. Why is that important? <o:p></o:p></div>
<div class="MsoNormal">
In a Production system, you will typically use multiple engines.
I.e. the archetypal example of Amazon: if you bought this, recommend that. But
you may also run a different algorithm on the front page for article discovery,
and another one for email campaign based on what you browsed for retargeting
purposes. <o:p></o:p></div>
<div class="MsoNormal">
Prediction.io does that very well.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
How to deploy a predictive model service? In a typical mobile
app, the user behavior data will send user actions. Your prediction model will
be trained on these, and the prediction.io engine will be deployed as <b style="mso-bidi-font-weight: normal;">a Web service</b>. So now your mobile app
can communicate wit h the engine via a REST API interface. If this was not
sufficient, there are other <b style="mso-bidi-font-weight: normal;">SDKs</b>
available in different languages. The engine will return a list of results in
JSON format. <o:p></o:p></div>
<div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHJKQMLdb7foSDbIK4UFiCKwnIUzuxCehD7z952TidzpapgqCURZmCaO2v2IYDT48B6BcBH_O2XQWil2-ZJ3NMvGoBIRWcaTCJr4LEyiKLaweyJ8re-FslEsSSbyzdnyABgwiB3e5UxOAh/s1600/Screen+Shot+2015-02-17+at+11.44.51+AM.png" imageanchor="1" style="margin-left: auto; margin-right: auto; text-align: center;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHJKQMLdb7foSDbIK4UFiCKwnIUzuxCehD7z952TidzpapgqCURZmCaO2v2IYDT48B6BcBH_O2XQWil2-ZJ3NMvGoBIRWcaTCJr4LEyiKLaweyJ8re-FslEsSSbyzdnyABgwiB3e5UxOAh/s1600/Screen+Shot+2015-02-17+at+11.44.51+AM.png" height="352" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Prediction.io interaction w/ a mobile app</td></tr>
</tbody></table>
</div>
<div>
<br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Prediction.io manages <b style="mso-bidi-font-weight: normal;">the
dependencies</b> of Spark and Hbase and the algorithms automatically. You can
launch it with a one-line command.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
When using the framework, it doesn’t act as a <b style="mso-bidi-font-weight: normal;">a black box</b> – Prediction.io is one of
the most popular ML product on Github (5000+ contributors).<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The framework is open-source, and is written in <b style="mso-bidi-font-weight: normal;">Scala</b>, to take advantage of the JVM
support and is a natural fit for distributed computing. R in comparison is not
so easy to scale. Also Prediction.io uses <b style="mso-bidi-font-weight: normal;">Spark</b>,
currently one of the best-distributed system framework to use, and is proven to
scale in Production. Algorithms are implemented via MLLib. Lastly, events are
store in Apache HBase as the NoSQL storage layer.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Preparing the data
for model training</b> is a matter of running the Event server (launched via
(‘pio eventserver’) and interacting with it, by defining the action (i.e. change
the product price), product (i.e. give a rating A for product x), product name,
attribute name, all in free format.<span style="mso-spacerun: yes;"> </span><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Building the engine is made easy because Prediction.io
offers templates for recommendation and classification. The engine is built on
an MVC architecture, and has the following components:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
- <b style="mso-bidi-font-weight: normal;">Data source</b>:
data comes from any data source, and is preprocessed automatically into the
desired format. Data is prepared and cleansed according to what the engine
expects. This follows the Separation of Concerns concept.<o:p></o:p></div>
<div class="MsoNormal">
- <b style="mso-bidi-font-weight: normal;">Algorithms</b>: ML algorithms
at your disposal to do what you need; ability to combine multiple algorithms.<o:p></o:p></div>
<div class="MsoNormal">
- <b style="mso-bidi-font-weight: normal;">Serving layer</b>:
ability to serve results based on predictions, and add custom business logic to
them.<o:p></o:p></div>
<div class="MsoNormal">
- <b style="mso-bidi-font-weight: normal;">Evaluator layer</b>:
ability to evaluate the performance of the prediction to compare algorithms.<o:p></o:p><br />
<br />
Of note, MLLib has made some improvements on the API lately to<a href="https://databricks.com/blog/2015/01/07/ml-pipelines-a-new-high-level-api-for-mllib.html"> address some of the concerns</a> (i.e. creating a ML pipeline).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In summary, Prediction.io believes the <b style="mso-bidi-font-weight: normal;">functions of an engine</b> should be to:<o:p></o:p></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Train deployable predictive model(s)<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Respond to dynamic queries<o:p></o:p></div>
<div class="MsoListParagraphCxSpLast" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Evaluate the algorithm being used<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h1>
How to get started?<o:p></o:p></h1>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The best way is to start is to:<o:p></o:p></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Download the code from <a href="https://github.com/predictionio">github</a><o:p></o:p></div>
<div class="MsoListParagraphCxSpLast" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="mso-spacerun: yes;"> </span>Get one
of the <a href="http://docs.prediction.io/templates/vanilla/quickstart/">templates</a>,
everything you need will be laid out and set up already that way, and the
template can be modified according to your needs.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The whole stack can be installed in one line of code. You
can then start and deploy the event server, and update the engine model with
new data.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>956</o:Words>
<o:Characters>5454</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>45</o:Lines>
<o:Paragraphs>12</o:Paragraphs>
<o:CharactersWithSpaces>6398</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com0tag:blogger.com,1999:blog-170648781806274754.post-48317174446433729802015-02-05T16:06:00.002-08:002015-02-16T16:09:36.366-08:00Navigating from Scala to Spark for distributed programming<div style="border-bottom: solid #4F81BD 1.0pt; border: none; mso-border-bottom-themecolor: accent1; mso-element: para-border-div; padding: 0in 0in 4.0pt 0in;">
<br /></div>
<div class="MsoNormal">
<div style="border-bottom: solid #4F81BD 1.0pt; border: none; mso-border-bottom-themecolor: accent1; mso-element: para-border-div; padding: 0in 0in 4.0pt 0in;">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In this post I will review Scala in a little more depth, and
attempt to demonstrate why it is a natural fit to use with Spark in the context
of distributed systems.<o:p></o:p></div>
<div class="MsoNormal">
This post is derived from what I learned in particular in <a href="http://confreaks.com/videos/4841-PNWS2014-apache-spark-i-from-scala-collections-to-fast-interactive-big-data-with-spark">this
video</a>, as well different other places, and aims to capture and reconcile
this knowledge.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h2>
Why is Scala a natural fit for distributed programming?<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Scala offers concurrency programming, Java interoperability
and functional programming out-of-the-box. In addition, the Collections API is
a first-class citizen in Scala, with its comprehensive list of available data
structures, ability to perform functional transformations, and immutability.
Another perk vis-à-vis distributed programming is that Scala can essentially go
from sequential, to parallel, to distributed programming in the same API.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Here is an example:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<u>Example of sequential code (in Scala):<o:p></o:p></u></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">Scala> List(1,3,5,7).map(_ * 2)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">Res0: List[Int] = List(2,6,10,14)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Map</b>() (a pure <u>Scala</u>
API) is inherently a parallizable operation. It divides and conquers the data
into splits and apply the map algorithm on to them.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<u>Example of parallel code (in Scala):<o:p></o:p></u></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">Scala> List(1,3,5,7).par.map(_ * 2)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">Res0: scala.collection.parallel.immutable.ParSeq[Int]=
ParVector(2,6,10,14)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Par</b>() in Scala
lets you <a href="http://docs.scala-lang.org/overviews/parallel-collections/overview.html">parallelize
your collection</a> on your machine cores, essentially swapping out the
sequential work to core-distributed processing on that collection.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<u>Example of distributed code (in Spark). <o:p></o:p></u></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">Scala> intRDD.map(_ * 2).take(100) <o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The code is essentially the same, but the collection is an
RDD this time. An RDD can be obtained from applying <b style="mso-bidi-font-weight: normal;">parallelize</b>() (which is a Spark API, different from <b style="mso-bidi-font-weight: normal;">par</b>() ) onto a simple Scala collection,
this time taking advantage of the distributed environment. See the section on
Laziness for an example.<o:p></o:p></div>
<div class="MsoNormal">
Also, an action (take() in our example) needs to be taken on
the data in order to get results, due to Scala’s inherent laziness (more on
this later).<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In distributed mode, <b style="mso-bidi-font-weight: normal;">map</b>()
produces a new RDD from the result.<o:p></o:p></div>
<div class="MsoNormal">
Map functions must be serializatble over the network. <span style="mso-spacerun: yes;"> </span>Note that RDD is immutable, i.e. its state
cannot be changed over time. It can be discarded if needed to free up some
memory. Note that an RDD has the following properties: immutability,
iterability, serializability, distributed-mode, and laziness.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Spark operations<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<u>Within an RDD:<o:p></o:p></u></div>
<div class="MsoNormal">
Map, filter, groupBy, sample<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<u>Cross RDDs:<o:p></o:p></u></div>
<div class="MsoNormal">
Join, Cartesian,<span style="mso-spacerun: yes;">
</span>cogroup (<a href="http://joshualande.com/cogroup-in-pig/">similar to
Pig’s CoGroup</a>; essentially a groupBy+Join on 2 RDDs)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<u>RDD actions:<o:p></o:p></u></div>
<div class="MsoNormal">
Reduce, count, collect, take, foreach<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<u>RDD Optimizations:<o:p></o:p></u></div>
<div class="MsoNormal">
Coalesce (similar to SQL’s operand), pipe, repartition<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Laziness of Scala<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Scala is inherently lazy; that is, the computation is not triggered
until you ask for results. The key to laziness is that the tail of the data is
not evaluated. <o:p></o:p></div>
<div class="MsoNormal">
For example, a stream is a lazy list:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">Scala> List(1,3,5,7).toStream<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">Res0: scala.collection.immutable.Stream[Int]= Stream(1,?)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">Scala> Res0.map(_ * 2)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>Res1:
scala.collection.immutable.Stream[Int]= Stream(2,?)<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
i.e. the tail is not evaluated until you ask for it (via for
example toList ).<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This is really useful in Spark. It allows it to do as little
work as possible; less memory is utilized for intermediate results. Also, Spark
can optimize the execution plan when all the transformations are known. E.g.,
all the map steps can be executed within the same phase, very much like <a href="http://tez.apache.org/">Tez</a> does in the M/R world.<o:p></o:p></div>
<div class="MsoNormal">
As an aside, this is also a problem: if the computation runs
as a highly optimized bundle, it does not make it easy to <a href="http://blog.explainmydata.com/2014/05/spark-should-be-better-than-mapreduce.html?m=1">debug
it</a>.. Thankfully, some Spark profilers are starting to <a href="http://blog.sematext.com/2014/10/07/apache-spark-monitoring/">hit the
market</a>.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
parallelize will convert its argument into an RDD. Example:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">Scala> </span>sparkContext. parallelize(1 to 10).map( x =>
x_ * i)<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><span style="mso-spacerun: yes;"> </span>Res6: MappedRDD[2]<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
</div>
<div class="MsoNormal">
i.e. you don’t get the result, just the mapped RDD to be
evaluated. Each transformation is a wrapper RDD, and is a step within Spark’s
lineage for recovery purposes.<o:p></o:p></div>
<div class="MsoNormal">
As said earlier, an RDD action function such as count will
force the computation. <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Types of caching<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Different configuration settings let you can serialize to
memory (default mode) and/or disk, or a combination of the two.<o:p></o:p></div>
<div class="MsoNormal">
On the other hand <a href="http://tachyon-project.org/">Tachyon</a>,
a memory-centric distributed file system,<o:p></o:p></div>
<div class="MsoNormal">
<span style="mso-spacerun: yes;"> </span>is an experimental configuration
mode that works off-heap and is resilient to worker failures and is showing
lots of promises.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Spark SQL boasts a different caching mechanism altogether,
working as an efficient columnar compressed in memory cache (like dictionary
compression, a la Parquet/ORC). It uses less memory than Java serialization,
and is faster.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Also, Spark caching has a TTL that is configurable for data,
after which old references are cleaned up.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Grouping and sorting<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
How to implement TopK <span style="mso-spacerun: yes;"> </span><u>in pure Scala</u>:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">Scala> </span>val words = Seq(“Apple”, <span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">"Bear",
"Tahoe", "a", "b", "c",
"Apple", "Apple", "Bear", "c",
"c")<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">Words:
Seq[String] = List(Apple, Bear, Tahoe, a, b, c, Apple, Apple, Bear, c, c)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">Scala> val
b = words.groupBy(x => x)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">b:
scala.collection.immutable.Map[String,Seq[String]] = Map(Bear -> List(Bear,
Bear), a -> List(a), Apple -> List(Apple, Apple, Apple), b -> List(b),
c -> List(c, c, c), Tahoe -> List(Tahoe))<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">Scala> val
c = b.map{ case (word, words) => (word, words.length) }<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">c:
scala.collection.immutable.Map[String,Int] = Map(Bear -> 2, a -> 1, Apple
-> 3, b -> 1, c -> 3, Tahoe -> 1)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">Scala> val
c1 = c.toSeq<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">c1:
Seq[(String, Int)] = ArrayBuffer((Bear,2), (a,1), (Apple,3), (b,1), (c,3),
(Tahoe,1))<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">scala> val
d = c1.sortBy(_._2).reverse.take(2)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">d:
Seq[(String, Int)] = ArrayBuffer((c,3), (Apple,3))<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
</div>
<div class="MsoNormal">
I.e. starting from a sequence of words, first create a Map
by way of a group by. Map words to their length, then sort them by length
(argument #2 in the new Sequence) and reverse. Here we are taking the top 2.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<u>In Spark – method 1<o:p></o:p></u></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">scala> </span>val
words = sc.parallelize(Seq(“Apple”, <span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">"Bear", "Tahoe",
"a", "b", "c", "Apple",
"Apple", "Bear", "c", "c") //sc refers
to SparkContext<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">words:
org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at
<console>:12<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">scala> val
b = words.map((_, 1)) //transforms each word into a tuple, K,V where K=word,
V=1<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">b:
org.apache.spark.rdd.RDD[(String, Int)] = MappedRDD[2]<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">scala> val
c = b.groupByKey <o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">c:
org.apache.spark.rdd.RDD[(String, Iterable[Int])] = MappedValuesRDD[5]<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">scala> val
d.map // counts and sorts<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">{ case (word,
counts) => (words, counts.sum) }.<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">sortBy(_._2,
false).take(2) //sortBy() is another network shuffle<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
</div>
<div class="MsoNormal">
First, parallelize the Seq into an RDD. Then transform each
word into a (K,V) tuple where V=1. Group all instances of same word on the same
node using groupByKey; network distributed shuffle happens. Then, using a Map,
count and sort the entries.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Caution: <o:p></o:p></div>
<div class="MsoNormal">
It’s not “words.map(_,1) “<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">scala> val
b = words.map(_,1)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;"><console>:14:
error: missing parameter type for expanded function <o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">((x$1) =>
words.map(x$1, 1))<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">val b =
words.map(_, 1)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;"><span style="mso-spacerun: yes;"> </span>^<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
</div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt 56.0pt 84.0pt 112.0pt 140.0pt 168.0pt 196.0pt 224.0pt 3.5in 280.0pt 308.0pt 336.0pt; text-autospace: none;">
<br /></div>
<div class="MsoNormal">
But word.map((_,1)), because that’s equivalent to words.map(
x => (x,1). I.e.:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">words:
org.apache.spark.rdd.RDD[String] = "Apple", "Bear",
"c", ParallelCollectionRDD[0] at parallelize at <console<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">scala> val
b = words.map((_, 1))<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">b:
org.apache.spark.rdd.RDD[(String, Int)] = MappedRDD[2]<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
</div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt 56.0pt 84.0pt 112.0pt 140.0pt 168.0pt 196.0pt 224.0pt 3.5in 280.0pt 308.0pt 336.0pt; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt 56.0pt 84.0pt 112.0pt 140.0pt 168.0pt 196.0pt 224.0pt 3.5in 280.0pt 308.0pt 336.0pt; text-autospace: none;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">See </span><a href="http://www.artima.com/pins1ed/functions-and-closures.html#8.5">http://www.artima.com/pins1ed/functions-and-closures.html#8.5</a>
for more details.<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<u>In Spark – method 2<o:p></o:p></u></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">scala> </span>val
words = sc.parallelize(Seq(“Apple”, <span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">"Bear", "Tahoe",
"a", "b", "c", "Apple",
"Apple", "Bear", "c", "c") //sc refers
to SparkContext<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">//
parallelize() converts the Seq into an RDD<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">words:
org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at
<console>:12<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">scala> val
b = words.map((_,1)).reduceByKey(_ + _) //reduceByKey = groupByKey + local reduce<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">b:
org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[14]<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">scala> val
c = b.map { case (word, count) => (count, word) }<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">c:
org.apache.spark.rdd.RDD[(Int, String)] = MappedRDD[15]<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">scala>
c.top(2) //top avoids the global sort by taking the top items from each node,
and merging them at the driver.<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">res4:
Array[(Int, String)] = Array((3,c), (3,Apple))<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
</div>
<div class="MsoNormal">
Similar, but more optimized, via the use of reduceByKey and
top.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Perform some ETL<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Let’s now review how to perform a simple ETL use case. <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In Scala: <o:p></o:p></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">scala> </span>io.Source.fromFile(/tmp/myfile.csv”).getLines.map(_.split(“,”)).<o:p></o:p></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
map(a => (a(0),,
a(2), a(5), a(6))), <o:p></o:p></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
filter(_._2 contains
“2014”),<o:p></o:p></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
take(20)<o:p></o:p></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
Iterator(String,
String, String, String)<o:p></o:p></div>
</div>
<div class="MsoNormal">
Read file, extract a few fields, filter by date<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<u>Parallel ETL in Scala:<o:p></o:p></u></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">scala> </span>(1
to 4).par.flatMaps(a => io.Source.FromFile(“/tmp/myfile.csv”)<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">.getLines).<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
map(_.split(“,”)).map(a
=> (a(0),, a(2), a(5), a(6))), <o:p></o:p></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
filter(_._2 contains
“2014”),<o:p></o:p></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
take(20)<o:p></o:p></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
// Breaks down the
file into multiple chunks, converts this into a big stream, and then runs it in
parallel<o:p></o:p></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
ParSeq(String, String,
String, String)<o:p></o:p></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<u>Spark ETL<o:p></o:p></u></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">scala> </span>sc.textFile((“/tmp/myfile.csv”)<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">.</span>map(_.split(“,”)).map(a
=> (a(0),, a(2), a(5), a(6))), <o:p></o:p></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
filter(_._2 contains
“2014”),<o:p></o:p></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
take(20)<o:p></o:p></div>
</div>
<div class="MsoNormal">
Code is the same! Except for loading the data initially.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h2>
<o:p> </o:p></h2>
<h2>
Summary<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="font-family: "Helvetica Neue"; mso-bidi-font-family: "Helvetica Neue";">This should provide a good explanation about why Scala and
Spark are a good match. A nice tutorial is here: A good tutorial on Spark: <span style="background-color: whitesmoke; font-family: 'Helvetica Neue Light', HelveticaNeue-Light, 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 13px; text-align: justify;"><a href="http://databricks.com/spark/developer-resources">http://databricks.com/spark/developer-resources</a></span>. Of note, a nice way to work with Spark, a la ipython in a notebook fashion,
is to use this: <o:p></o:p></span></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="font-family: "Helvetica Neue"; mso-bidi-font-family: "Helvetica Neue";"><a href="https://github.com/andypetrella/spark-notebook">https://github.com/andypetrella/spark-notebook</a><o:p></o:p></span></div>
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>1507</o:Words>
<o:Characters>8595</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>71</o:Lines>
<o:Paragraphs>20</o:Paragraphs>
<o:CharactersWithSpaces>10082</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="0" Name="Body Text"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com2tag:blogger.com,1999:blog-170648781806274754.post-47087831996682889402015-01-14T16:33:00.000-08:002015-02-25T12:47:15.695-08:00How to load some Avro data into Spark<div class="MsoNormal">
<br /></div>
<div style="border-bottom: solid #4F81BD 1.0pt; border: none; mso-border-bottom-themecolor: accent1; mso-element: para-border-div; padding: 0in 0in 4.0pt 0in;">
<h2>
How to load some Avro data into Spark</h2>
<div class="MsoTitle">
<o:p></o:p></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h3>
First, why use Avro?</h3>
<h2>
<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The most basic format would be CSV, which is
non-expressive,<span style="mso-spacerun: yes;"> </span>and doesn’t have a
schema associated with the data. <o:p></o:p></div>
<div class="MsoNormal">
A common format that got popular after this is XML, which
conveniently has a schema associated with the data; XML is commonly used in Web
Services and SOA architectures. Unfortunately it is very verbose, and parsing
XML is very memory intensive.<span style="mso-spacerun: yes;"> </span><o:p></o:p></div>
<div class="MsoNormal">
On the other end of the spectrum is JSON, which is very
popular to use as it is convenient and easy to learn. <o:p></o:p></div>
<div class="MsoNormal">
These formats are<span style="mso-spacerun: yes;"> </span>not
splittable in the context of Big data, which makes them difficult to use. Using
a compression mechanism on top of it (Snappy, Gzip) does not solve the problem.<o:p></o:p></div>
<div class="MsoNormal">
Hence different data formats have come out recently.<o:p></o:p></div>
<div class="MsoNormal">
<span style="mso-spacerun: yes;"> </span>Avro is widely used
as a common serialization platform, as it interoperable across multiple
languages, offers a compact and fast binary format, supports dynamic schema
discovery<span style="mso-spacerun: yes;"> </span>(via its generic type) and
schema evolution, and is compressible and splittable. It also offers complex
data structures like nested types.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h3>
Example code</h3>
<h2>
<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Let’s walk through an example, creating an Avro schema with
its IDL, and generating some data. In a real case example, organizations
usually have some data in a more mundane format such as XML, and they will need
to translate their data into Avro with tools like <a href="http://www.infoq.com/articles/AVROSchemaJAXB">JAXB</a> . <span style="mso-spacerun: yes;"> </span>Let’s use <a href="http://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/">this
example</a> , with this twitter.avsc schema:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">{<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"type"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">:</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"record",<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"name"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">:</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"twitter_schema",<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"namespace"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">:</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"com.miguno.avro",<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"fields"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">:</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">[</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">{</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"name"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">:</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"username",<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"type"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">:</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"string",<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"doc"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">:</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"Name of the user account
on Twitter.com"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">},<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">{<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"name"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">:</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"tweet",<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"type"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">:</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"string",<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"doc"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">:</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"The content of the user's
Twitter message"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">},<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">{<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"name"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">:</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"timestamp",<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"type"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">:</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"long",<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"doc"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">:</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"Unix epoch time in
seconds"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">}</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">],<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"><span style="mso-spacerun: yes;"> </span></span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"doc:"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">:</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">"A basic schema for
storing Twitter messages"</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">}</span><span style="font-family: Times; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
and some data in twitter.json:<span style="mso-tab-count: 1;"> </span><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">{"username":"miguno","tweet":"Rock:
Nerf paper, scissors is fine.","timestamp":</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">1366150681</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">}</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">{"username":"BlizzardCS","tweet":"Works
as intended.<span style="mso-spacerun: yes;"> </span>Terran is IMBA.","timestamp":</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">1366154481</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;"> </span><span style="border: none windowtext 1.0pt; color: #333333; font-family: "inherit","serif"; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-border-alt: none windowtext 0in; mso-fareast-font-family: "Times New Roman"; padding: 0in;">}</span><span style="font-family: Times; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
We will convert the data (in Json) into binary Avro format.<o:p></o:p></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$ java -jar ~/avro-tools-1.7.7.jar fromjson --schema-file
twitter.avsc twitter.json > twitter.avro </span><span style="color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier;"><o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
We will then convert the Avro data into Java:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$ java -jar /app/avro/avro-tools-1.7.7.jar compile schema
/app/avro/data/twitter.avsc /app/avro/data/</span><span style="color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier;"><o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Let’s now compile these classes, and package them in a Jar:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$CLASSPATH=/app/avro/avro-1.7.7-javadoc.jar:/app/avro/avro-mapred-1.7.7-hadoop1.jar:/app/avro/avro-tools-1.7.7.jar<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$ javac -classpath $CLASSPATH
/app/avro/data/com/miguno/avro/twitter_schema.java<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;">$ jar cvf Twitter.jar com/miguno/avro/*.class<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
We can now fire up Spark, passing in the Jar we just created
as well as the needed libraries (Hadoop and Avro):<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">$ ./bin/spark-shell
--jars
/app/avro/avro-mapred-1.7.7-hadoop1.jar,/avro/avro-1.7.7.jar,/app/avro/data/Twitter.jar</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In the REPL, let’s then retrieve our data and create an RDD
from it, then retrieve an element of the data:<o:p></o:p></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">scala> <o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">import
com.miguno.avro.twitter_schema<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">import
org.apache.avro.file.DataFileReader;<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">import
org.apache.avro.file.DataFileWriter;<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">import
org.apache.avro.io.DatumReader;<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">import
org.apache.avro.io.DatumWriter;<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">import
org.apache.avro.specific.SpecificDatumReader;<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">import
org.apache.avro.mapreduce.AvroKeyInputFormat<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">import
org.apache.avro.mapred.AvroKey<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">import
org.apache.hadoop.io.NullWritable<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">import
org.apache.avro.mapred.AvroInputFormat<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">import
org.apache.avro.mapred.AvroWrapper<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">import
org.apache.avro.generic.GenericRecord<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">import
org.apache.avro.mapred.{AvroInputFormat, AvroWrapper}<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">import
org.apache.hadoop.io.NullWritable<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">val path =
"/app/avro/data/twitter.avro"<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">val avroRDD =
sc.hadoopFile[AvroWrapper[GenericRecord], NullWritable,
AvroInputFormat[GenericRecord]](path)<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">avroRDD.map(l
=> new String(l._1.datum.g{ et("username").toString()) } ).first<o:p></o:p></span></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This returns:<o:p></o:p></div>
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<br /></div>
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">res2: String =
miguno</span><span style="border: none windowtext 1.0pt; color: #333333; font-family: Monaco; font-size: 10.0pt; mso-bidi-font-family: Courier; mso-border-alt: none windowtext 0in; padding: 0in;"><o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
A couple of notes:<o:p></o:p></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->We are using the MR1 classes, but the MR2
classes work the same (with a slightly different API.<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->We are using GenericRecord as opposed to Specific
because we generated the Avro schema (and imported it). More on this at <a href="http://avro.apache.org/docs/current/gettingstartedjava.html">http://avro.apache.org/docs/current/gettingstartedjava.html</a><o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Note that even though the Avro classes were compiled
in Java, you can import them in Spark since Scala works on the JVM.<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Avro lets you define as an option a way to
specify the type to deserialize to on a per element basis in the schema, via a
key/value pair, which is convenient. See <a href="http://stackoverflow.com/questions/27827649/trying-to-deserialize-avro-in-spark-with-specific-type/27859980?noredirect=1%23comment44240726_27859980">http://stackoverflow.com/questions/27827649/trying-to-deserialize-avro-in-spark-with-specific-type/27859980?noredirect=1#comment44240726_27859980</a><o:p></o:p></div>
<div class="MsoListParagraphCxSpLast" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->There are plenty of other ways to do this, one
being with Kryo, an another one via Spark SQL. However this requires you to get
a Spark SQL context (see https://github.com/databricks/spark-avro) , as opposed
to a pure Spark/Scala approach. However this may be the best practice in the
future?<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>813</o:Words>
<o:Characters>4636</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>38</o:Lines>
<o:Paragraphs>10</o:Paragraphs>
<o:CharactersWithSpaces>5439</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
<br /></div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com2tag:blogger.com,1999:blog-170648781806274754.post-76721155967130053822014-11-17T14:57:00.001-08:002014-11-17T15:08:27.354-08:00Productionalization of Hadoop : Data governance<h1>
</h1>
<div class="MsoNormal">
Now that Hadoop has matured, it is natural to ask how it can
grow from being a POC project to actually integrate it further in the
enterprise. In this post we will review what it takes to “productionalize”
Hadoop and the typical tasks involved in doing this.<o:p></o:p></div>
<div class="MsoNormal">
Implementing a new product in the enterprise usually
implicates adhering to the rules your IT team has laid out, often for
regulatory compliance. Outside of the specific constraints of Hadoop, a
non-comprehensive list of needs goes like this:<br />
<br />
- Provides <b style="mso-bidi-font-weight: normal;">error handling</b> and <b style="mso-bidi-font-weight: normal;">failure recovery</b>.<br />
- Has<b style="mso-bidi-font-weight: normal;"> logging</b> and <b style="mso-bidi-font-weight: normal;">monitoring</b>.<br />
- Data <b style="mso-bidi-font-weight: normal;">security</b> (encryption, access
authorization...).<br />
- <b style="mso-bidi-font-weight: normal;">Deployment</b> into Production vs. UAT
environment or data folder structure and deployment automation.<br />
- Provide <b style="mso-bidi-font-weight: normal;">easy access</b> to data for
production support for investigation purposes.<br />
- <b style="mso-bidi-font-weight: normal;">Disaster recovery </b>(DR).<br />
- <b style="mso-bidi-font-weight: normal;">Master Data</b> management: policies
around data frequencies, source availability<o:p></o:p></div>
<div class="MsoListParagraphCxSpFirst" style="margin-left: 9.0pt; mso-add-space: auto; mso-list: l0 level1 lfo1; tab-stops: 9.0pt; text-indent: -9.0pt;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Concepts of Data <b style="mso-bidi-font-weight: normal;">Quality: </b>enforcement through metadata driven rules, hierarchies/attributes.<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 9.0pt; mso-add-space: auto; mso-list: l0 level1 lfo1; tab-stops: 9.0pt; text-indent: -9.0pt;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Have a <b style="mso-bidi-font-weight: normal;">testing
and integration</b> procedure cycle: from unit testing to user acceptance
testing.<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 9.0pt; mso-add-space: auto; mso-list: l0 level1 lfo1; tab-stops: 9.0pt; text-indent: -9.0pt;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><b style="mso-bidi-font-weight: normal;">Multi-tenancy</b>,
with the assumption that the product of choice is shared across projects. Attention
must be given to storage and processing capacity planning.<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 9.0pt; mso-add-space: auto; mso-list: l0 level1 lfo1; tab-stops: 9.0pt; text-indent: -9.0pt;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><b style="mso-bidi-font-weight: normal;">Business
process integration</b>: policies around data frequencies, source availability.<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 9.0pt; mso-add-space: auto; mso-list: l0 level1 lfo1; tab-stops: 9.0pt; text-indent: -9.0pt;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><b style="mso-bidi-font-weight: normal;">Lifecycle
management</b>: data retention, purge schedule, storage, archival.<o:p></o:p></div>
<div class="MsoListParagraphCxSpLast" style="margin-left: 9.0pt; mso-add-space: auto; mso-list: l0 level1 lfo1; tab-stops: 9.0pt; text-indent: -9.0pt;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><b style="mso-bidi-font-weight: normal;">Metadata</b>:
data definition, catalog, lineage.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Now let’s review this list in the context of Hadoop;
actually a number of these items is answered by the underlying framework
(regardless of the vendor). Other items will be managed by additional 3<sup>rd</sup>-party
components. Some concepts are newer than others. Here is a list of these in the
form of a table:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>1186</o:Words>
<o:Characters>6761</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>56</o:Lines>
<o:Paragraphs>15</o:Paragraphs>
<o:CharactersWithSpaces>7932</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
table.MsoTableLightShadingAccent1
{mso-style-name:"Light Shading - Accent 1";
mso-tstyle-rowband-size:1;
mso-tstyle-colband-size:1;
mso-style-priority:60;
mso-style-unhide:no;
border-top:solid #4F81BD 1.0pt;
mso-border-top-themecolor:accent1;
border-left:none;
border-bottom:solid #4F81BD 1.0pt;
mso-border-bottom-themecolor:accent1;
border-right:none;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;
color:#365F91;
mso-themecolor:accent1;
mso-themeshade:191;}
table.MsoTableLightShadingAccent1FirstRow
{mso-style-name:"Light Shading - Accent 1";
mso-table-condition:first-row;
mso-style-priority:60;
mso-style-unhide:no;
mso-tstyle-border-top:1.0pt solid #4F81BD;
mso-tstyle-border-top-themecolor:accent1;
mso-tstyle-border-left:cell-none;
mso-tstyle-border-bottom:1.0pt solid #4F81BD;
mso-tstyle-border-bottom-themecolor:accent1;
mso-tstyle-border-right:cell-none;
mso-tstyle-border-insideh:cell-none;
mso-tstyle-border-insidev:cell-none;
mso-para-margin-top:0in;
mso-para-margin-bottom:0in;
mso-para-margin-bottom:.0001pt;
line-height:normal;
mso-ansi-font-weight:bold;
mso-bidi-font-weight:bold;}
table.MsoTableLightShadingAccent1LastRow
{mso-style-name:"Light Shading - Accent 1";
mso-table-condition:last-row;
mso-style-priority:60;
mso-style-unhide:no;
mso-tstyle-border-top:1.0pt solid #4F81BD;
mso-tstyle-border-top-themecolor:accent1;
mso-tstyle-border-left:cell-none;
mso-tstyle-border-bottom:1.0pt solid #4F81BD;
mso-tstyle-border-bottom-themecolor:accent1;
mso-tstyle-border-right:cell-none;
mso-tstyle-border-insideh:cell-none;
mso-tstyle-border-insidev:cell-none;
mso-para-margin-top:0in;
mso-para-margin-bottom:0in;
mso-para-margin-bottom:.0001pt;
line-height:normal;
mso-ansi-font-weight:bold;
mso-bidi-font-weight:bold;}
table.MsoTableLightShadingAccent1FirstCol
{mso-style-name:"Light Shading - Accent 1";
mso-table-condition:first-column;
mso-style-priority:60;
mso-style-unhide:no;
mso-ansi-font-weight:bold;
mso-bidi-font-weight:bold;}
table.MsoTableLightShadingAccent1LastCol
{mso-style-name:"Light Shading - Accent 1";
mso-table-condition:last-column;
mso-style-priority:60;
mso-style-unhide:no;
mso-ansi-font-weight:bold;
mso-bidi-font-weight:bold;}
table.MsoTableLightShadingAccent1OddColumn
{mso-style-name:"Light Shading - Accent 1";
mso-table-condition:odd-column;
mso-style-priority:60;
mso-style-unhide:no;
mso-tstyle-shading:#D3DFEE;
mso-tstyle-shading-themecolor:accent1;
mso-tstyle-shading-themetint:63;
mso-tstyle-border-left:cell-none;
mso-tstyle-border-right:cell-none;
mso-tstyle-border-insideh:cell-none;
mso-tstyle-border-insidev:cell-none;}
table.MsoTableLightShadingAccent1OddRow
{mso-style-name:"Light Shading - Accent 1";
mso-table-condition:odd-row;
mso-style-priority:60;
mso-style-unhide:no;
mso-tstyle-shading:#D3DFEE;
mso-tstyle-shading-themecolor:accent1;
mso-tstyle-shading-themetint:63;
mso-tstyle-border-left:cell-none;
mso-tstyle-border-right:cell-none;
mso-tstyle-border-insideh:cell-none;
mso-tstyle-border-insidev:cell-none;}
</style>
<![endif]-->
<!--StartFragment-->
<span style="font-family: Cambria; font-size: 12.0pt; mso-ansi-language: EN-US; mso-ascii-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-language: AR-SA; mso-bidi-theme-font: minor-bidi; mso-fareast-font-family: "MS 明朝"; mso-fareast-language: EN-US; mso-fareast-theme-font: minor-fareast; mso-hansi-theme-font: minor-latin;"><br clear="all" style="mso-special-character: line-break; page-break-before: always;" />
</span>
<br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<table border="1" cellpadding="0" cellspacing="0" class="MsoTableLightShadingAccent1" style="border-collapse: collapse; border: none; mso-border-alt: solid windowtext .5pt; mso-border-insideh: .5pt solid windowtext; mso-border-insidev: .5pt solid windowtext; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 1184;">
<tbody>
<tr>
<td style="background: #D9D9D9; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="111"><div class="MsoNormal">
<b><u><span style="color: #365f91; font-size: 14.0pt; mso-bidi-font-size: 12.0pt; mso-themecolor: accent1; mso-themeshade: 191;">Feature<o:p></o:p></span></u></b></div>
</td>
<td style="background: #D9D9D9; border-left: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 326.7pt;" valign="top" width="327"><div class="MsoNormal">
<b><u><span style="color: #365f91; font-size: 14.0pt; mso-bidi-font-size: 12.0pt; mso-themecolor: accent1; mso-themeshade: 191;">Tool<o:p></o:p></span></u></b></div>
</td>
</tr>
<tr>
<td style="background: #D3DFEE; border-top: none; border: solid windowtext 1.0pt; mso-background-themecolor: accent1; mso-background-themetint: 63; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="111"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Error
handling<o:p></o:p></span></div>
</td>
<td style="background: #D3DFEE; border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-background-themecolor: accent1; mso-background-themetint: 63; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 326.7pt;" valign="top" width="327"><div class="MsoNormal">
<span style="color: #365f91; mso-themecolor: accent1; mso-themeshade: 191;">Hadoop Core, Falcon<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="111"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Logging
and monitoring<o:p></o:p></span></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 326.7pt;" valign="top" width="327"><div class="MsoNormal">
<span style="color: #365f91; mso-themecolor: accent1; mso-themeshade: 191;">Ambari, Cloudera Manager<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="background: #D3DFEE; border-top: none; border: solid windowtext 1.0pt; mso-background-themecolor: accent1; mso-background-themetint: 63; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="111"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Security<o:p></o:p></span></div>
</td>
<td style="background: #D3DFEE; border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-background-themecolor: accent1; mso-background-themetint: 63; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 326.7pt;" valign="top" width="327"><div class="MsoNormal">
<span style="color: #365f91; mso-themecolor: accent1; mso-themeshade: 191;">Sentry, Kerberos, Knox, Ranger,
Dataguise<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="111"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Automated
deployment<o:p></o:p></span></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 326.7pt;" valign="top" width="327"><div class="MsoNormal">
<span style="color: #365f91; mso-themecolor: accent1; mso-themeshade: 191;">Ambari, Cloudera Manager<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="background: #D3DFEE; border-top: none; border: solid windowtext 1.0pt; mso-background-themecolor: accent1; mso-background-themetint: 63; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="111"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Disaster
recovery<o:p></o:p></span></div>
</td>
<td style="background: #D3DFEE; border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-background-themecolor: accent1; mso-background-themetint: 63; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 326.7pt;" valign="top" width="327"><div class="MsoNormal">
<span style="color: #365f91; mso-themecolor: accent1; mso-themeshade: 191;">Cloudera Backup and Disaster
Recovery, Falcon<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="111"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Master
data management<o:p></o:p></span></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 326.7pt;" valign="top" width="327"><div class="MsoNormal">
<span style="color: #365f91; mso-themecolor: accent1; mso-themeshade: 191;">Pentaho Kettle, Talend<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="background: #D3DFEE; border-top: none; border: solid windowtext 1.0pt; mso-background-themecolor: accent1; mso-background-themetint: 63; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="111"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Data
quality<o:p></o:p></span></div>
</td>
<td style="background: #D3DFEE; border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-background-themecolor: accent1; mso-background-themetint: 63; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 326.7pt;" valign="top" width="327"><div class="MsoNormal">
<span style="color: #365f91; mso-themecolor: accent1; mso-themeshade: 191;">Talend<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="111"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Testing
and integration<o:p></o:p></span></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 326.7pt;" valign="top" width="327"><div class="MsoNormal">
<span style="color: #365f91; mso-themecolor: accent1; mso-themeshade: 191;">Apache MRUnit, PigUnit<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="background: #D3DFEE; border-top: none; border: solid windowtext 1.0pt; mso-background-themecolor: accent1; mso-background-themetint: 63; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="111"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Multi
tenancy support<o:p></o:p></span></div>
</td>
<td style="background: #D3DFEE; border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-background-themecolor: accent1; mso-background-themetint: 63; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 326.7pt;" valign="top" width="327"><div class="MsoNormal">
<span style="color: #365f91; mso-themecolor: accent1; mso-themeshade: 191;">YARN, Hadoop schedulers<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="111"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Data
impact monitoring<o:p></o:p></span></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 326.7pt;" valign="top" width="327"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Cloudera Navigator, Falcon,
Ambari, Cloudera Manager<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="background: #D3DFEE; border-top: none; border: solid windowtext 1.0pt; mso-background-themecolor: accent1; mso-background-themetint: 63; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="111"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Infrastructure
monitoring<o:p></o:p></span></div>
</td>
<td style="background: #D3DFEE; border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-background-themecolor: accent1; mso-background-themetint: 63; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 326.7pt;" valign="top" width="327"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Ganglia,
Nagios, Ambari and Cloudera Manager.<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="111"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Life
cycle management<o:p></o:p></span></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 326.7pt;" valign="top" width="327"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Falcon, Cloudera Navigator<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="background: #D3DFEE; border-top: none; border: solid windowtext 1.0pt; mso-background-themecolor: accent1; mso-background-themetint: 63; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="111"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Metadata
management<o:p></o:p></span></div>
</td>
<td style="background: #D3DFEE; border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-background-themecolor: accent1; mso-background-themetint: 63; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 326.7pt;" valign="top" width="327"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">HCatalog,
Cloudera Navigator (search, classification), Falcon (tag search)<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="111"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Tagging
and search<o:p></o:p></span></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 326.7pt;" valign="top" width="327"><div class="MsoNormal">
<span style="color: #365f91; mso-bidi-font-weight: bold; mso-themecolor: accent1; mso-themeshade: 191;">Cloudera Navigator, Falcon<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="background: #D3DFEE; border-top: none; border: solid windowtext 1.0pt; mso-background-themecolor: accent1; mso-background-themetint: 63; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 110.7pt;" valign="top" width="111"><div class="MsoNormal">
<br /></div>
</td>
<td style="background: #D3DFEE; border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-background-themecolor: accent1; mso-background-themetint: 63; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 326.7pt;" valign="top" width="327"><div class="MsoNormal">
<br /></div>
</td>
</tr>
</tbody></table>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
As you see, some components are common across vendors for
the most part (like Security or Monitoring), and can be leveraged from the
vendor’s UI offering (Cloudera Manager, Ambari). Some are offered by 3<sup>rd</sup>
party vendors, like MDM and Data Quality. However a slew of these features is
covered by a tool unique to the vendor (although sometimes offered as an open
source product). In this post we will cover Apache Falcon as it relates to some
unique aspects of productionalisation of the Hadoop cluster; unsurprisingly its
concepts are somewhat also similar to what Cloudera Navigator offers.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h2>
Why Falcon?<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
After a POC, data has landed, scripts in Hive and Pig have
been written, data has been joined between disparate systems, and is showing some
aggregation in some BI tool. The pipeline of work manifests itself as a set of
data flows.<o:p></o:p></div>
<div class="MsoNormal">
However once this data pipeline needs to be put into
Production, it generally needs to follow data governance requirements. This
materializes generally in the form of some <b>Oozie
workflows</b>, with in addition the need to orchestrate with other tools, like
distcp or Sqoop.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Apache Falcon attempts to answer data governance
requirements such as:<o:p></o:p></div>
<div class="MsoListParagraphCxSpFirst" style="margin-left: 27.0pt; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]-->-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal;">
</span><!--[endif]--><b>Data
impact analysis</b>: what happens if some data feed gets moved? What if someone
changes these files, who is going to be impacted by this?<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 27.0pt; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]-->-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal;">
</span><!--[endif]--><b>Monitoring</b>:
there is a need to monitor not just the infrastructure, but also the full data
pipeline, as well as the ownership of it.<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 27.0pt; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]-->-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal;">
</span><!--[endif]--><b>Late data
handling</b>: data never arrive perfectly on time, due to the variety of
different sources; how to deal with this?<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 27.0pt; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]-->-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal;">
</span><!--[endif]--><b>Replication,
retention of data</b>: there are generally different replication policies for
the different data sets, i.e. raw data vs. cleansed data vs. production data.<o:p></o:p></div>
<div class="MsoListParagraphCxSpLast" style="margin-left: 27.0pt; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]-->-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal;">
</span><!--[endif]--><b>Compliance</b>
for auditing<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
These typically complicate the simple data pipeline. These
requirements cannot be simply translated into a simple one-size-fits-all
template; there needs to be some higher-level tool to answer these questions.
(Of note, Cloudera Navigator handles most of these requirements except
late-data handling).<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Falcon automatically <b>generates
Oozie workflows</b> to answer these requirements, based on higher-level XML
workflows. This dramatically simplifies the process of answering these
requirements.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h3>
Vocabulary<o:p></o:p></h3>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Falcon poses the concepts of:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoListParagraphCxSpFirst" style="margin-left: 27.0pt; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]-->-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal;">
</span><!--[endif]-->A <b>Cluster</b>,
which represents the interfaces to the JT, NN. <o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 27.0pt; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]-->-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal;">
</span><!--[endif]-->A <b>Feed</b>
which is the data itself<o:p></o:p></div>
<div class="MsoListParagraphCxSpLast" style="margin-left: 27.0pt; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]-->-<span style="font-family: 'Times New Roman'; font-size: 7pt; font-stretch: normal;">
</span><!--[endif]-->A <b>Process</b>,
which consume or produce feeds. <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
So essentially the user needs to define his/her entities via
these concepts in XML, then manipulate them to create <b>modular, reusable</b> pipelines of data.<o:p></o:p></div>
<div class="MsoNormal">
The higher-order XML tags in Falcon currently cover OOTB
policies like <b>replication, retention,
late data handling </b>configuration. However they are also extensible, like
allowing solutions for encrypting, or general external transformations. Also it
allows engines other than the OOTB ones, like Spring batch instead of Oozie, or
Cascading instead of Hive/Pig.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<span style="font-family: Cambria; font-size: 12.0pt; mso-ansi-language: EN-US; mso-ascii-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-language: AR-SA; mso-bidi-theme-font: minor-bidi; mso-fareast-font-family: "MS 明朝"; mso-fareast-language: EN-US; mso-fareast-theme-font: minor-fareast; mso-hansi-theme-font: minor-latin;"><br clear="all" style="mso-special-character: line-break; page-break-before: always;" />
</span>
<br />
<div class="MsoNormal">
<br /></div>
<h3>
Monitoring<o:p></o:p></h3>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>Monitoring</b> of the
data pipeline can be run as a combination of Falcon and Ambari. Ambari’s UI
lets you supervise the infrastructure, but also the Falcon alerts for pipeline
starts/end or error, pipeline run history, and scheduling.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h3>
Tracing: Lineage, Tagging, Search, Auditing<o:p></o:p></h3>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Tracing in Falcon allows you to visualize the different Falcon
<b>components (feeds, etc) of the pipeline
linked together</b>. This addresses the case when you want to make some changes
to a certain step and need to see an overview of these steps visually; in
Falcon this is called <b>Lineage</b>
information, represented visually, and stored internally in GraphDb. This
answers the impact analysis requirement.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Tagging provides a key/value pair to <b>feeds and processes</b>. I.e.: Owner = X. Data source=DW. This is
widely used to show ownership, business value, and the external source system
or the destination.<o:p></o:p></div>
<div class="MsoNormal">
This also allows you to <b>perform
search</b>, for example to retrieve all of the feeds that are tagged “secure”
and are owned by X.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Auditing simply refers to <b>logging all changes</b> to the pipeline, i.e. the action taken along
with the user who took it.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h3>
Falcon user flow<o:p></o:p></h3>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In a nutshell, the process and user flow to work in Falcon
goes:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
1/ <b>Define</b> your pipeline
definition; feeds, processes, etc. Define the pipelines and flows in XML.<o:p></o:p></div>
<div class="MsoNormal">
2/ <b>Submit</b> these
pipelines from the Falcon CLI. The Falcon server validates the specifications
given.<o:p></o:p></div>
<div class="MsoNormal">
3/ <b>Launch and
schedule</b>; the Oozie workflows are generated<o:p></o:p></div>
<div class="MsoNormal">
5/ <b>Manage</b> the
workflow: at this point the user can check, suspend, resume the pipelines for
updates.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h3>
Architecture<o:p></o:p></h3>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The <b>Falcon server </b>is
essentially a centralized orchestration framework. It saves the XML definitions
and handle JMS notification mechanisms (via Active MQ) to subscribe and get
notifications about the pipeline. Ambari manages all this.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h2>
Closing thoughts on Falcon<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Apache Falcon (and on the Cloudera side, Navigator) aims to
answer the requirements of data governance mandated by data stewards. The
higher-level definition and language of Falcon in regard to its entities
(processes, feed) is a step in the right direction. In practice, Falcon is
still relatively immature at this point (Nov, 2014) regarding lineage (very
limited UI, REST API hard to use), JMS notifications limited to completed
events only, late-data handling API too coarse, and no Sqoop support, but I
believe these limitations are only temporary.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<!--EndFragment--><br />
<div class="MsoNormal">
<br /></div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com2tag:blogger.com,1999:blog-170648781806274754.post-72585163249500892872014-11-10T15:09:00.003-08:002015-10-23T17:10:55.053-07:00An overview of Apache Spark<div dir="ltr" style="text-align: left;" trbidi="on">
<h1>
Overview of Apache Spark<o:p></o:p></h1>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
<u><span style="font-size: 14.0pt; mso-bidi-font-size: 13.0pt;">Origins<o:p></o:p></span></u></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-tab-count: 1;"> </span>It is a
well-known fact that Map Reduce is not good at multi-stage queries, and is a rather cumbersome solution for applications that are <b style="mso-bidi-font-weight: normal;">iterative</b>, such as interactive queries, and some machine learning
algorithms. So the motivation to create Spark was centered around these areas.
Spark’s goal is two-fold: <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l1 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Have primitives for an <b style="mso-bidi-font-weight: normal;">in-memory </b>Map Reduce-like engine for iterative algorithm support.<o:p></o:p></div>
<div class="MsoListParagraphCxSpLast" style="mso-list: l1 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Not be restricted to just Map Reduce, and replace that model with a DAG engine instead of discrete M/R steps. Spark
provides a set of operators that are <b style="mso-bidi-font-weight: normal;">higher-level,
expressive</b>, and offer clean APIs.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The philosophy of Spark (read: business model?) is that
unlike other projects in open source, Spark was built and marketed to get
maximum exposure and be popularized as a useful and notable distributed
framework in replacement of Map Reduce, as opposed to just being a university
research project made open-source after the fact with no real support.<o:p></o:p></div>
<div class="MsoNormal">
In light of this, Spark offers powerful standard libraries,
a relational data model, and canned machine learning algorithms that we’ll
review in more details. A definite advantage is also Spark’s unified support
across many environments, is cloud-friendly, resource managers (YARN, Mesos,
also offers its own). One of the biggest advantage that Spark has to offer is
the fact that it offers a <b style="mso-bidi-font-weight: normal;">unified API
and framework </b>across data analysis contexts: online, offline, machine
learning, graph.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Spark aims to be <b style="mso-bidi-font-weight: normal;">compatible</b>
with Hadoop’s Input and Output format API, meaning a user can plug in Spark
onto its existing Hadoop cluster and data and can directly use Spark on top of
it; Spark supports HDFS, but also HBase, S3, Cassandra, etc.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In terms of support, Spark releases are pretty regimented
(every 3 months), and are defined by time, not by scope; again this is is to
give <b style="mso-bidi-font-weight: normal;">predictability</b> to users and a
sense of stability to the project, as opposed to<a href="http://blog.cloudera.com/blog/2012/04/apache-hadoop-versions-looking-ahead-3/">
Hadoop in its earlier years</a>, probably to also drive user adoption of Spark.
Suffice is to say Apache Spark is the most active project in the Apache
foundation as of Nov, 2014, and currently boasts exponential growth. And for
good reason: Spark <b style="mso-bidi-font-weight: normal;">broke the performance
record</b> for sorting <a href="https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html">benchmark
of 100TB in 23 mins</a>, 200 machines (Oct, 2014), as a mostly network-bound
job mostly; comparatively Hadoop doesn’t take advantage of the hardware
resources as efficiently. It is worth noting that this was not using only memory (100TB cannot easily be stored in memory), but actually spilling to disk, which shows Spark is also performant using disk.<o:p></o:p></div>
<h2>
Map Reduce versus Spark<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-tab-count: 1;"> </span>In Map
Reduce (MR), data is <b style="mso-bidi-font-weight: normal;">written back</b> to
HDFS between iterations. Bear in mind that MR was designed 10+ years ago when
memory was expensive. Consider a set of Hive queries on the same data:<o:p></o:p></div>
<div class="MsoNormal">
<div class="MsoNormal">
<br /></div>
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>17</o:Words>
<o:Characters>99</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>1</o:Lines>
<o:Paragraphs>1</o:Paragraphs>
<o:CharactersWithSpaces>115</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div style="background: white; border: solid #E7DEC3 1.0pt; mso-border-alt: solid #E7DEC3 .75pt; mso-element: para-border-div; padding: 10.0pt 12.0pt 10.0pt 12.0pt;">
<div class="MsoNormal" style="background: white; border: none; line-height: 17.4pt; mso-border-alt: solid #E7DEC3 .75pt; mso-padding-alt: 10.0pt 12.0pt 10.0pt 12.0pt; padding: 0in; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt; vertical-align: baseline;">
<span style="line-height: normal;">HDFS input</span></div>
<div class="MsoListParagraphCxSpFirst" style="margin-left: 79.0pt; mso-add-space: auto; mso-list: l0 level1 lfo2; text-indent: -.25in;">
read -> query1<o:p></o:p></div>
<div class="MsoListParagraphCxSpLast" style="margin-left: 79.0pt; mso-add-space: auto; mso-list: l0 level1 lfo2; text-indent: -.25in;">
read -> query2</div>
</div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoListParagraph" style="mso-list: l1 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->the ‘read’ portion is considerably slow in MR,
due <b style="mso-bidi-font-weight: normal;">to 3 phases</b>: the replication,
the serialization, and the disk I/O. In fact on average <b style="mso-bidi-font-weight: normal;">90% of the time</b> is spent on these phases, instead of computing the
actual algorithm from the query itself! The same principle applies to some
machine learning algorithms, like gradient descent (which essentially has a
for-loop to descent to the local minimum). <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Besides the in-memory primitives and the new operators,
Spark’s architecture is such that it is often optimized to <b style="mso-bidi-font-weight: normal;">avoid multiple passes</b> on the data; so sometimes a Map+Reduce (Group
by) + Reduce (Sort) is done in one pass in Spark and faster than MR, even
without using the in-memory paradigm.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h2>
Advantages of Spark<o:p></o:p></h2>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l1 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->In practice, it is said that Spark is <b style="mso-bidi-font-weight: normal;">10x faster</b> than Hadoop on average.<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l1 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><b style="mso-bidi-font-weight: normal;">Agility</b>
compared to the monolithic aspect of Hadoop: Spark allows rapid changes, thanks
to loading the data into memory and interacting with it in a rapid manner. The
shell (<b style="mso-bidi-font-weight: normal;">REPL</b>) is great to test things
out.<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l1 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><b style="mso-bidi-font-weight: normal;">Data
scientists & non-data engineers</b> can use Spark through Python.<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l1 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><b style="mso-bidi-font-weight: normal;">Newer platform</b>
with multiple tools like Machine learning, Graph and streaming included, with
strong community support.<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l1 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><b style="mso-bidi-font-weight: normal;">Scala is
superior</b> for data processing thanks to its higher level of abstraction.
Although Spark supports Java, it is recommended to use Scala in Spark as a non-functional programming will make coding less intuitive and lower level. Also using
Scala will make <b style="mso-bidi-font-weight: normal;">debugging</b> will be
easier. A combination of using an IDE (for API autocompletion and data typing)
and REPL (interactive shell) s actually best for efficiency.<o:p></o:p></div>
<div class="MsoListParagraphCxSpLast" style="mso-list: l1 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Databricks makes <b style="mso-bidi-font-weight: normal;">cluster provisioning </b>very easy.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
However Spark is definitely less mature than Hadoop, is more
bug-prone, and doesn’t have a good solution for managing and monitoring the
system.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h2>
<u>Spark </u><u><span style="font-size: 14.0pt; mso-bidi-font-size: 13.0pt;">architecture</span><o:p></o:p></u></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Architecturally Spark is made up of 2 concepts:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l1 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><b style="mso-bidi-font-weight: normal;">Resilient
Distributed Datasets</b> (RDD), which is a collection of Java objects spread
within the cluster,<span style="mso-spacerun: yes;"> </span>as a distributed
collection/dataset. This allows to directly have access to a higher-level API. A RDD is split into partitions. Each partition must fit on one node. A node hosts multiple partitions. An RDD is typed like a Scala collection (i.e. RDD[Int]); for example reading a text file/line input in Spark returns a RDD[String].<o:p></o:p></div>
<div class="MsoListParagraphCxSpLast" style="mso-list: l1 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->A <b style="mso-bidi-font-weight: normal;">DAG
execution engine</b>, as a Master-Slave architecture. The master is called the
Driver, the slaves the Executors/Workers (2 processes). Executors are where the computation is run and data is cached. These run even when there is no running jobs.
This <b style="mso-bidi-font-weight: normal;">avoids the JVM start</b> time like
in the case of MR’s task trackers. The cons is that you get a <b style="mso-bidi-font-weight: normal;">fixed number</b> of Executors, remedied only
if you use YARN’S Resource Manager to allocate resources dynamically. The Driver controls the program flow and executes the steps.<br />
<br />
It should be noted that Spark has
also an <b style="mso-bidi-font-weight: normal;">elastic scaling</b> feature for
ETL jobs , a la EMR, where you set your configuration to allow a
minimum/maximum of executors instances.</div>
<div class="MsoNormal" style="margin-left: .25in;">
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h3>
Fault Tolerance<o:p></o:p></h3>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-tab-count: 1;"> </span>Fault
tolerance is provided by having the RDDs track the series of transformations
making up the data processing, and <b style="mso-bidi-font-weight: normal;">recompute
the lost data</b> in case of failure, through the operation log called lineage. This is the 'resilient' part of the RDD acronym. It is worth noting that this lineage feature is nothing new: it actually comes from the M/R paradigm; i.e. if a map task fails, the whole Hadoop job doesn't fail but instead the task is recomputed on a different node. However whereas in M/R the computation is segregated by jobs (i.e. where a data analysis is formed of multiple steps (filter, group, etc) that is each implemented by a M/R job), hence there is no resiliency built-in across jobs, in Spark the computation is much more fluid and resiliency works across the data analysis steps.<o:p></o:p></div>
<div class="MsoNormal">
Also, the data is <b style="mso-bidi-font-weight: normal;">replicated
into memory</b> twice, in case of failure of an Executor. However if the
replica is not fully done before the node fails, data may get lost. <o:p></o:p></div>
<div class="MsoNormal">
Provisioning Spark on YARN will allow the Resource Manager
to spin the Spark Master fail-over proxy (and the same applies in Mesos) and <a href="http://blog.cloudera.com/blog/2014/05/how-apache-hadoop-yarn-ha-works/">offer
HA</a>; another way is to set up Zookeeper for Spark’s Driver in standalone
mode, to mitigate some of the problems of resiliency in case the Driver goes
down.<o:p></o:p></div>
<h3>
Using memory<o:p></o:p></h3>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-tab-count: 1;"> </span>Spark uses
memory for fast computation. However if <b style="mso-bidi-font-weight: normal;">memory
is unavailable</b>, Spark will gracefully spill to disk. The strategy used for
this is Least-Recently-Used: the dataset that has been less used will spill to
disk.<o:p></o:p></div>
<div class="MsoNormal">
Currently the Spark user has to <b style="mso-bidi-font-weight: normal;">specify</b> what data set (once processed) has to be saved in memory.
Automatic adaptive saving to memory is currently a subject of research and is
not possible today.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h3>
Security<o:p></o:p></h3>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-tab-count: 1;"> </span><b style="mso-bidi-font-weight: normal;">Security features</b> like Kerberos can be
set up on Spark as long as Spark is used in a YARN configuration. <o:p></o:p></div>
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>1441</o:Words>
<o:Characters>8215</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>68</o:Lines>
<o:Paragraphs>19</o:Paragraphs>
<o:CharactersWithSpaces>9637</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<span style="font-family: Cambria; font-size: 12.0pt; mso-ansi-language: EN-US; mso-ascii-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-language: AR-SA; mso-bidi-theme-font: minor-bidi; mso-fareast-font-family: "MS 明朝"; mso-fareast-language: EN-US; mso-fareast-theme-font: minor-fareast; mso-hansi-theme-font: minor-latin;"><br clear="all" style="mso-special-character: line-break; page-break-before: always;" />
</span>
<br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
<u><span style="font-size: 14.0pt; mso-bidi-font-size: 13.0pt;">Spark ecosystem<o:p></o:p></span></u></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Spark essentially expended its <b>ecosystem of tools</b> to provide a one-stop shop for doing all kinds
of analytics. Let’s review these components.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h2>
<span style="font-size: 14.0pt; mso-bidi-font-size: 13.0pt;">Spark</span> <span style="font-size: 14.0pt; mso-bidi-font-size: 13.0pt;">SQL<o:p></o:p></span></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Spark SQL is a complete rewrite (made faster) of what used
to be Shark, a replacement of Hive, as <b>a
tool on top of Spark with relational semantics</b>. Spark SQL is implemented to
be compatible with Hive, and so existing Hive tables can still be used within
Spark SQL. <o:p></o:p></div>
<div class="MsoNormal">
Spark SQL does not cache Hive records as Java objects, which
would incur too much overhead. Instead it uses <b>column-oriented storage</b> using primitive types (int, string, etc),
similar to Parquet or ORC, with the same advantages of faster response time due
to only scanning the needed columns, auto selection of best compression
algorithm per column, etc.<o:p></o:p></div>
<div class="MsoNormal">
The icing on the cake comes from the fact that Spark SQL
code can be mixed up with “pure” Spark code, and <b>one can be called from the other</b>. Also the Spark/scala console
makes this easy.<o:p></o:p></div>
<div class="MsoNormal">
Spark SQL is compatible with Hive 0.13.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h2>
<span style="font-size: 14.0pt; mso-bidi-font-size: 13.0pt;">Spark</span> <span style="font-size: 14.0pt; mso-bidi-font-size: 13.0pt;">Streaming</span><o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Spark was extended to perform <b>stream computations</b>. The way this works is, Spark Streaming runs as
a <b>series of small batch jobs</b> during
a period of (configurable) time, also called micro-batching. The state of the
RDDs for these jobs is kept in memory.<o:p></o:p></div>
<div class="MsoNormal">
Lowest latency in Spark Streaming is in <b>order of seconds,</b> not less; this accommodates 90% of streaming use
cases usually. Storm on the other hand Storm can handle discrete events in a
flow of steps, akin to a CEP system in terms of speed of processing. <o:p></o:p></div>
<div class="MsoNormal">
Regarding <b>fault-tolerance</b>,
Spark Streaming offers write-ahead log for full HA operation. Comparatively, in
Storm, if the supervisor fails, the data gets reassigned and replayed, at the
cost of having processing done twice.<o:p></o:p></div>
<div class="MsoNormal">
Usually streaming frameworks are comprised of a pipeline of
nodes; each node maintains a mutable state. However this state is lost if the
node fails. So in Storm for example, each record in the worst case is processed
<b>at least</b> once (could be multiple times). A remedy to this in Storm is to
use the Trident API, which functions as micro-batching and offers transactions
to update states. However this comes with a slower throughput.<o:p></o:p></div>
<div class="MsoNormal">
Also <b>Storm has a
lower API</b> than Spark Streaming does. And it has no built-in concept of
look-back aggregation, nor a way to easily combine batch with streaming.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Technically Spark Streaming offers a <b>new interface</b>, DStream to deal with streaming, as well as a new
operator to work on a timed-window: ‘reduceByWindow()’ for incremental
computation, with arguments as the window length and the sliding interval. This
can be run on a key basis also.<o:p></o:p></div>
<div class="MsoNormal">
One of the advantages of Spark Streaming is code reuse, and
intermixing of it with standard Spark code, i.e. the ability to mix batch
(offline) with real-time (online) computing.<o:p></o:p></div>
<div class="MsoNormal">
Some ML algorithms are also libraries, like K-means, to be
available online.<o:p></o:p></div>
<h2>
<span style="font-size: 14.0pt; mso-bidi-font-size: 13.0pt;">GraphX<o:p></o:p></span></h2>
<div class="MsoNormal">
<br /></div>
<!--EndFragment--><br />
<div class="MsoNormal">
Spark was extended to add a graph-processing library.<o:p></o:p></div>
</div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com0tag:blogger.com,1999:blog-170648781806274754.post-19404028801248236762014-10-07T16:02:00.001-07:002014-10-07T16:06:28.940-07:00Common distributed systems implementation practices at different companies<br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Reflecting back on a number of projects that I have worked
on or encountered at different companies, I want to share a few thoughts about
how distributed systems are being used that I have seen emerge:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
</div>
<ul>
<li><span style="text-indent: -0.25in;"><span style="font-family: 'Times New Roman'; font-size: 7pt;"> </span></span><span style="text-indent: -0.25in;">A lot of companies are still at </span><b style="text-indent: -0.25in;">early stages</b><span style="text-indent: -0.25in;"> with Hadoop and
distributed systems. The main use case aims to supplement/replace more
expensive systems, like Teradata or Oracle. A common trend is to start doing
Data science (80% of cases being about recommendation engines).</span></li>
<li><span style="text-indent: -0.25in;"><span style="font-family: 'Times New Roman'; font-size: 7pt;"> </span></span><b style="text-indent: -0.25in;">Kafka</b><span style="text-indent: -0.25in;">
is now a commonly implemented system to retrieve logs from a live application
and ingest into Hadoop, compared to Flume or other messaging system like
RabbitMQ these days. I believe it is thanks to Kafka’s relatively easiness of
deployment and its reputation for scalability and fast ingest time.</span></li>
<li><span style="text-indent: -0.25in;"> The demand for </span><b style="text-indent: -0.25in;">resource managers</b><span style="text-indent: -0.25in;"> in distributed systems while generating a lot of
hype, is almost inexistent! Most small/medium-sized companies just don’t want
to bother with adding an existing layer on top of their cluster to control
application/user management like Mesos, and prefer to implement </span><b style="text-indent: -0.25in;">multiple</b><span style="text-indent: -0.25in;">
clusters for each usage: one for Production, one for Development, etc. Some
startups ever go further, and use a cluster per engineer! Why? Because
typically, these companies don’t have that much data to wrestle with, and the
cost of spinning up a cluster dynamically on AWS is very cheap. Even bigger
companies do a lot of experimentations: Apple disclosed recently at the 2014 Cassandra
summit that they use upwards of 75k Cassandra nodes, across a multitude of
clusters. So Resource management across applications is low priority generally
in practice; in the Hadoop world, resource management is generally limited to
disk quota and an out-of-the-box priority queueing system on a user group basis.</span></li>
<li><span style="text-indent: -0.25in;"> Companies generally don’t have a good thought-of
</span><b style="text-indent: -0.25in;">big data architecture</b><span style="text-indent: -0.25in;">, and are
willing to try different tools and live with a hodge-podge of them; an example
is Airbnb that runs both Hive and Presto concurrently on the same data, or
Twitter with a fair number of Scalding users but also Spark.</span></li>
<li><span style="text-indent: -0.25in;"> Avro is typically the </span><b style="text-indent: -0.25in;">data format</b><span style="text-indent: -0.25in;"> mostly used by small/medium companies, thanks to its
advantage of schema evolution and multi-language support. Parquet is recognized
as the newcomer, and thought of as more performant and more cross-platform than
its nemesis, ORC, which really only supports Hive. Of note is the decline of
Java programmers for Python in the data world.</span></li>
<li><span style="text-indent: -0.25in;"> </span><b style="text-indent: -0.25in;">Spark</b><span style="text-indent: -0.25in;">
as a distributed framework for big data is slowly taking over, as people are realizing
Hadoop APIs are just too low-level, and Cascading APIs are good but not frankly
as well written than Spark’s. Also regarding resource management,
Databricks/Spark is rumored to get its own resource management tool soon, which
would circumvent the need for YARN’s if in a Spark only environment.</span></li>
</ul>
<br />
<div class="MsoListParagraphCxSpLast" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<o:p></o:p></div>
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>436</o:Words>
<o:Characters>2486</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>20</o:Lines>
<o:Paragraphs>5</o:Paragraphs>
<o:CharactersWithSpaces>2917</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
<br /></div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com0tag:blogger.com,1999:blog-170648781806274754.post-91843996695597289252014-08-05T16:33:00.003-07:002018-06-12T09:52:17.276-07:00How to perform capacity planning for a Hadoop cluster<div dir="ltr" style="text-align: left;" trbidi="on">
<h1>
Provisioning Hadoop machines<o:p></o:p></h1>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Recently I had a customer ask what kind of machine to
purchase to be used in a Hadoop environment, and what configuration to use. The
answer to this can be essentially derived from some simple calculations that I
want to write about and demonstrate.<span style="font-family: "times"; font-size: 10.0pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The number of machines, and specs of the machines, depends
on a few factors: the volume of data (obviously), the data retention policy
(how much can you afford to keep before throwing away), the type of workload
you have (data science/CPU driven vs “vanilla” use case/IO-bound), and also the
data storage mechanism (data container, type of compression used if any). We
have to make some assumptions from the beginning; otherwise there are just too
many parameters to deal with. These assumptions drive the data nodes
configuration.<o:p></o:p></div>
<div class="MsoNormal">
The other types of machines (Name Node/Job tracker, in
Hadoop 1) will need different specs, and are generally more straightforward. We’ll
just talk about data nodes in this post.<o:p></o:p></div>
<h2>
Capacity planning for your data<o:p></o:p></h2>
<div class="MsoNormal">
<span style="mso-tab-count: 1;"> </span>The number
of machines to purchase will depend on the volume of data to store and analyze,
which will drive the number of spinning disks to get on a per machine basis
(usually a fixed number of hard drives/machine). The below applies mainly to Hadoop 1.x versions. We will talk about Yarn later on.<o:p></o:p><br />
Capacity planning usually flows from a top-down approach of understanding:<br />
- How many nodes you need<br />
- What's the capacity of each node, on the CPU side<br />
- What's the capacity of each node, on the memory side.</div>
<div class="MsoNormal">
Let's do a back-of-the-envelope calculation. This is usually the first estimation you need to make when assessing what you need and budget for the machines.</div>
<h3>
Data nodes<o:p></o:p></h3>
<div class="MsoNormal">
The HDFS’ configuration is usually set up to replicate the
data 3 ways. So you will need 3x the actual storage capacity for your data. In
addition, you will need to sandbag the machine capacity for temporary storage
for computation (i.e. storage for transient Map outputs stays local to the
machine, it doesn’t get stored on HDFS. Also, local storage for compression is
needed). A good rule of thumb is to keep the disks at 70% capacity. Then we
also need to take into account the compression ratio.<o:p></o:p></div>
<div class="MsoNormal">
Let’s take an example:<o:p></o:p></div>
<div class="MsoNormal">
Say we have 70Tb of raw data to store on a yearly basis
(i.e. moving window of 1 year). So after compression (say, with Gzip with <a href="http://www.gnu.org/software/gzip/manual/gzip.html">a 60% ratio</a>) we will
get:<br />
<br />
<ul style="text-align: left;">
<li> 70 – (70 * 60%) = 28Tb </li>
<li>that we will multiply by 3x = 84Tb, </li>
<li>but keep 70% capacity: 84Tb = x * 70% thus x = 84/70% = <b>120Tb</b> is the value we need for capacity planning for data durability, for 70Tb of raw data.</li>
</ul>
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h3>
Number of nodes<o:p></o:p></h3>
<div class="MsoNormal" style="background: white; line-height: 14.25pt; margin-bottom: 9.0pt;">
Here are the recommended specifications for DataNode/TaskTrackers in a
balanced Hadoop cluster<span style="color: #505050; font-family: "helvetica neue"; font-size: 10.0pt;"> <a href="http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/">from
Cloudera</a>:<o:p></o:p></span></div>
<div class="MsoNormal">
<br />
<ul style="text-align: left;">
<li>12-24 1-4TB hard disks in a JBOD (Just a Bunch Of Disks)
configuration (no RAID, please!)</li>
<li>multi-core CPUs (say, 12), running at least 2-2.5GHz</li>
</ul>
<o:p></o:p></div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
So let’s divide up the value we have in capacity planning by
the number of hard disks we need in a way that makes sense: 120Tb/12 1Tb = 10
nodes.<o:p></o:p></div>
<div class="MsoNormal">
How about # of tasks for each node?</div>
<h3>
Number of tasks per node<o:p></o:p></h3>
<div class="MsoNormal">
First, let's figure out the # of tasks per node:</div>
<div class="MsoNormal">
<br />
<ul style="text-align: left;">
<li>Usually count 1 core per task. If the job is not too heavy
on CPU, then the number of tasks can be greater than the number of cores.</li>
<li>Example: 12 cores, jobs use ~75% of CPU</li>
<li>We are starting with 12 cores per machine. Let's assign free slots= 14 (slightly > # of cores is a good rule of thumb),
maxMapTasks=8, maxReduceTasks=6.</li>
<li>Again, this changes in the context of YARN.</li>
</ul>
</div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<br />
<h3>
Memory</h3>
</div>
<div class="MsoNormal">
Now let's figure out the memory we can assign to these tasks. By default, the tasktracker and datanode take up each 1 GB of
RAM per default.<br />
<br />
<ul style="text-align: left;">
<li>For each task, calculate <i><span style="color: #333333; font-family: "arial"; font-size: 10.5pt;">mapred.child.java.opts</span></i><span style="color: #333333; font-family: "arial"; font-size: 10.5pt;"> </span>(200MB per default) of RAM. </li>
<li>In addition, count 2 GB
for the OS. So say, having 24 Gigs of memory available:</li>
<li>24-2= 22 Gig available for our 14 tasks
– thus we can assign 1.5 Gig for each of our tasks (14 * 1.5 = 21 Gigs).</li>
</ul>
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Yarn<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>I</b>n YARN, the arbitrary fixed limits put in place on
the number of mappers and reducer slots assigned on each cluster node disappears:
the notion of fixed slots has been discarded, and resources are now configured
in terms of amounts of memory (in megabytes) and CPU (“v-cores”). <span style="mso-spacerun: yes;"> </span>Instead, YARN uses <i style="mso-bidi-font-style: normal;">yarn.nodemanager.resource.memory-mb </i>and <i style="mso-bidi-font-style: normal;">yarn.nodemanager.resource.cpu-vcores</i>,
which control the amount of memory and CPU on each node, both available to both
maps and reduces. If configuring these manually, simply set these to the amount
of memory and number of <i style="mso-bidi-font-style: normal;">cores</i> on the
machine after subtracting out resources needed for other services. See <span style="background: white; color: #666666; font-family: "arial"; font-size: 10.0pt;"><a href="http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cdh5ig_mapreduce_to_yarn_migrate.html">http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cdh5ig_mapreduce_to_yarn_migrate.html</a>
</span>for more details.<span style="font-family: "times"; font-size: 10.0pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<h2>
<o:p> </o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>610</o:Words>
<o:Characters>3482</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>29</o:Lines>
<o:Paragraphs>8</o:Paragraphs>
<o:CharactersWithSpaces>4084</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com0tag:blogger.com,1999:blog-170648781806274754.post-19656091851384820842014-05-21T20:46:00.000-07:002014-05-21T20:59:45.765-07:00Apache Pig: the good, the bad, the ugly<div style="border-bottom: solid #4F81BD 1.0pt; border: none; mso-border-bottom-themecolor: accent1; mso-element: para-border-div; padding: 0in 0in 4.0pt 0in;">
I recently worked on a project involving Apache Pig. I’ve
been using Hadoop for quite some time (~3 years), and this was the first time I
was actually delving into Pig. Here are some of my notes and comments about the
tool.</div>
<h1>
The Good</h1>
<h2>
A tool for developers<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Pig is really a great tool for developers; it is a scripting
language that lets you code your ETL pipeline as an abstraction layer above
Hadoop. It is better than Hive in the fact that you can debug your script
step-by-step, and are being helped by plug-ins such as <a href="https://wiki.apache.org/pig/PigPen">PigPen</a>. It is the scripting
language missing in Datameer, when the number of worksheets in your ETL becomes
unmanageable (IBM’s BigSheets at least translates your sheets into Pig). <o:p></o:p></div>
<div class="MsoNormal">
Pig is truly integrated in the Hadoop ecosystem, i.e. I can
use the latest storage format such as <a href="http://parquet.io/">Parquet</a>,
can be coded <a href="http://gethue.com/category/pig/">in the Hue</a>
environment, and is even ported into other frameworks like Spark (<a href="https://github.com/mateiz/spork">Spork</a>). <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
An example of Productionizing Pig that is just a good
example of how integrated Pig is with the rest of the tools:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoListParagraphCxSpMiddle">
<blockquote class="tr_bq">
<br />
<br />
1. Create/Load your data in HDFS, i.e.:</blockquote>
<br />
1|2002|matt|lieber|..</div>
<div class="MsoListParagraphCxSpMiddle">
and save it in say, /hdfs/data/mydata/mydata.txt (or move it from your local directory via hadoop fs -copyFromLocal). In this case this is pipe-separated data. <br />
<br />
<blockquote class="tr_bq">
2. Create your Hive table script, in say mytable.hql:<br />
<br />
<br />
drop table if exists ${hiveconf:SCHEMANAME}.mytable;<br />
<br />
create external table ${hiveconf:SCHEMANAME}.table
(
<br />
a_m string,
<br />
name_sk string,
<br />
...<br />
spcl_pgm_cd string
<br />
)
row format delimited fields terminated by '|' location '${hiveconf:HDFSDIR}';</blockquote>
<br />
3. Run it: <span style="text-indent: -0.25in;"> hive -f mytable.hql -hiveconf SCHEMANAME=myschema -hiveconf HDFSDIR='/hdfs/data/mydata'; </span><br />
<span style="text-indent: -0.25in;"><br /></span>
<span style="text-indent: -0.25in;">4. You can now access the data in this Hive table via your Pig script, assuming HCatalog is installed (it is by default enabled in both CDH and HDP these days):</span><br />
<span style="text-indent: -0.25in;"><br /></span>
<br />
<div class="p1">
pig -f myscript.pig -p SCHEMANAME=myschema -p PROCESS_DATE=20140416 -useHCatalog</div>
</div>
<h2>
</h2>
<h2>
Pig versions<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I used Pig 0.12, which contains a lot of new built in
functions and operations, and I was able to completely avoid writing UDFs (which
is a dicey choice in my opinion: you don’t necessarily know how to optimize
these, and any changes in your business requirements needs a
recompilation/retesting of your Java/streaming code). <o:p></o:p></div>
<div class="MsoNormal">
Obviously the latest version of Pig supports Hadoop 2.0. <o:p></o:p></div>
<div class="MsoNormal">
Documentation is pretty good, as long as you get the right
version! A Google search doesn’t necessarily points you to the right version..
For reference 0.12 is <a href="http://pig.apache.org/docs/r0.12.0/func.html">here</a>.<o:p></o:p></div>
<div align="right" class="MsoNormal" style="text-align: right;">
<br /></div>
<div align="right" class="MsoNormal" style="text-align: right;">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h1>
The Bad<o:p></o:p></h1>
<div class="MsoNormal">
<br /></div>
<h2>
Operations and Built-in functions<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
There are surprisingly few built-in common functions that
you truly need in Pig: 90% of the time for my ETL, I didn’t need any of the
fancy new operations; instead, I used FOREACH, condition statements (?), JOIN,
COGROUP, GROUP. That’s it. <o:p></o:p></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
Which is good and bad: it is simple to learn, but it can
get amazingly complex and cumbersome to use. <o:p></o:p></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
An example: my project had to do with performing ETL
across some tables from an old Teradata system and ingesting the result into
HDFS files instead. <o:p></o:p></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
One specific requirement had to do with performing a join
between 2 tables, deduplicating, then compare the column (c1)’s values between each other; if
equal, map column c2’s value to a new table, if not, map sum of c3’s column
values to that new table.<o:p></o:p></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
Here is the code in Pig:<o:p></o:p></div>
<div>
<br /></div>
-- Join<br />
<div>
<div class="p1">
-- Join A_SRVC and A_SRVC_FIN on A_SK, A_SRVC_SK, ID, MOD_ID, SRVC_LINE_NUM<br />
join = JOIN srvc by (a_sk, a_srvc_sk, id, mod_id, srvc_line_num) <br />
, srvc_fin by (a_sk, a_srvc_sk, id, mod_id, srvc_line_num); </div>
<div class="p1">
<br /></div>
<div class="p1">
</div>
<div class="p1">
-- If multiple records present, pick the one with the latest A_SRVC_FIN.SRC_UPDT_DTTM<br />
uniq_ltst_clm_detail_srvc_fin_grp = GROUP join BY (srvc_fin::a_srvc_fin_sk, <br />
srvc::a_sk, srvc::clm_id, <br />
srvc::mod_id, _srvc::srvc_line_num);<br />
derive_unique_fin_dedup = FOREACH uniq_ltst_detail_srvc_fin_grp {<br />
ordered_data = ORDER join by srvc_fin::src_updt_dttm DESC;<br />
limit_data = LIMIT ordered_data 1;<br />
GENERATE FLATTEN(limit_data);<br />
};</div>
-- First Calculate cast to int to be able to perform the sum
<br />
<br />
<div>
derive_unique_srvc_fin_dedup2 = FOREACH derive_unique_fin_dedup
GENERATE *, (int)srvc_fin::srvc_line_item_chrg_amt as Chrg_amt_int;
</div>
<div>
<br /></div>
<div>
derive_unique_srvc_fin_dedup_grp = group derive_unique_srvc_fin_dedup2 BY (limit_data::srvc_fin::a_srvc_fin_sk,
limit_data::srvc::a_sk,
limit_data::srvc::id,
limit_data::srvc::mod_id,
limit_data::srvc::srvc_line_num); <br />
<br />
<br />
-- Calculate the sum ahead of time
</div>
<div>
Srvc_line_item_amt = FOREACH derive_unique_srvc_fin_dedup_grp
</div>
<div>
GENERATE FLATTEN($1),SUM(derive_unique_srvc_fin_dedup2.Chrg_amt_int) AS Chrg_amt; <br />
<br />
<br />
-- Calculate whether or not we have all the values equal to each other, and flatten everything
</div>
<div>
<br /></div>
<div>
Compare_Charge_amt_dst_grp = GROUP Srvc_line_item_amt BY (srvc_fin::a_srvc_fin_sk,
med_claim_srvc::a_med_clm_sk, srvc::clm_id,
srvc::mod_id, srvc::srvc_line_num);
Compare_Charge_amt = FOREACH Compare_Charge_amt_dst_grp
</div>
<div>
{</div>
<div>
Compare_Charge_amt_dst = DISTINCT Srvc_line_item_amt.(derive_unique_fin_dedup2::limit_data::srvc_fin::tot_chrg_amt);
</div>
<div>
GENERATE FLATTEN($1),</div>
<div>
(COUNT(Compare_Charge_amt_dst)==1 ? 'y' : 'n') as Charge,
</div>
<div>
FLATTEN(Srvc_line_item_amt.(derive_unique_srvc_fin_dedup2::limit_data::srvc_fin::tot_chrg_amt)), FLATTEN(Srvc_line_item_amt.Chrg_amt);
</div>
<div>
};</div>
<div>
Compare_Charge_amt2 = GROUP Compare_Charge_amt ALL;
</div>
<div>
Compare_Charge_amt3 = FOREACH Compare_Charge_amt2 </div>
<div>
GENERATE FLATTEN($1);<br />
<br />
<br />
-- Final calculation/output
</div>
<div>
Compare_Charge_amt4 = FOREACH Compare_Charge_amt3
</div>
<div>
GENERATE (Compare_Charge_amt.Charge=='y' ?
Compare_Charge_amt::null::derive_unique_srvc_fin_dedup2::limit_data::srvc_fin::tot_chrg_amt :
(chararray)Compare_Charge_amt::null::Chrg_amt);<br />
<br />
<br />
As you may see in the code, I needed to first calculate the SUM ahead of time, then calculate whether the values were equal to each other (via a Count==1 comparison, neat trick!), and FLATTEN everything prior, otherwise not everything would be at the same 'level': some data was in a bag, some were tuples, some were scalars. Then re-FLATTEN everything again, and finish the calculation. If there is a better/more elegant way to do this, I am all ears, but essentially my problem stems from the fact that after multiple groupings and joins, the data is highly dereferenced and is not easy to get to in Pig. FLATTEN does not take any arguments other than a straight bag/tuple, *not* an operation on them (i.e. cannot do FLATTEN(A.a == 3 ? 'y' : 'n'), which makes things difficult.</div>
<div>
In any case, DESCRIBE is your friend to let you understand what sort of data structure you are ending up with.<br />
<br />
<div class="MsoNormal">
<br /></div>
<h1>
The Ugly<o:p></o:p></h1>
<h2>
Error messages</h2>
<h2>
<div class="MsoNormal" style="font-size: medium; font-weight: normal;">
Error messages in Pig are in 50% of the cases, unfriendly and unhelpful. If it's beyond a syntax error or a wrong disembiguation problem, Pig will throw a cryptic message that just tells you: "something is wrong, figure it out".<br />
Example of an error:<br />
<br />
<div class="p1">
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:<br />
<line 37, column 32> expression is not a project expression: (Name: UserFunc(org.apache.pig.builtin.CONCAT) Type: null Uid: null)</div>
<div class="p1">
<br /></div>
<div class="p1">
That actually meant that my CONCAT function was not taking the right set of arguments ..</div>
<div class="p1">
<br /></div>
</div>
<div class="MsoNormal" style="font-size: medium; font-weight: normal;">
</div>
</h2>
<h2>
Dereferencing</h2>
<div class="MsoNormal">
It is mostly a pain to work with dereferenciation; see my previous code. Again, plugins in Eclipse will help, but still, it makes the code rather less readable. And consider this code:<br />
<br />
Compare_Charge_amt_dst = DISTINCT Srvc_line_item_amt.(derive_unique_fin_dedup2::limit_data::srvc_fin::tot_chrg_amt);
<br />
<br />
Here I am disambiguating my fields via '::', by way of using the tuple dereferencing operator '('. It took me *a whole day* to find out how to write this syntax..</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Pig Unit</h2>
<div class="MsoNormal">
Seriously, I have to go back to Java to implement my Pig
Unit tests ? <o:p></o:p></div>
<div class="MsoNormal">
And surprise, Pig Unit doesn’t support Hcatalog .. Which
means that I have to load my complete schema “by hand”, all 12 tables with ~50
fields each.. Not great.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Error Handling</h2>
<div class="MsoNormal">
There is no good way to handle errors in Pig (other than by using the nice SPLIT operator based on conditions). I thought that Apache Falcon, the new data management solution would help, but it is only useful for general coordination of data pipelines (i.e. CRON-like workflow via Oozie) and lifecycle management. It cannot even take in error messages from Pig and create notifications via JMS at this point ! (May, 2014).<br />
<br />
<br /></div>
<div class="MsoNormal">
<h1>
Conclusion</h1>
</div>
<div class="MsoNormal">
It has been a nice but steep road to Pig enlightenment; my ETL pipeline works, but I wonder how Cascading or similar packages might have been a more elegant solution. Don't get me wrong, Pig is a nice tool - but like anything, </div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com4tag:blogger.com,1999:blog-170648781806274754.post-17395899783699270292014-04-29T18:21:00.001-07:002014-04-29T18:21:05.131-07:00Beating the stock market with Big Data !This is an <a href="http://www.datameer.com/blog/uncategorized/predicting-the-stock-market-with-datameer.html">old post of mine</a>, that still applies to Hadoop ..<br />
<br />
<br />Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com0tag:blogger.com,1999:blog-170648781806274754.post-81220450724769927152014-04-14T16:46:00.003-07:002014-04-14T16:46:22.980-07:00ETL with Hadoop<br />
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>679</o:Words>
<o:Characters>3872</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>32</o:Lines>
<o:Paragraphs>9</o:Paragraphs>
<o:CharactersWithSpaces>4542</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="0" Name="Body Text"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<div class="MsoNormal">
<br /></div>
<div style="border-bottom: solid #4F81BD 1.0pt; border: none; mso-border-bottom-themecolor: accent1; mso-element: para-border-div; padding: 0in 0in 4.0pt 0in;">
<div class="MsoTitle">
ETL with Hadoop<o:p></o:p></div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt;">Here are a few pointers about
how to do ETL with the common set of Hadoop tools.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<h1>
Extraction / Ingestion<o:p></o:p></h1>
<div class="MsoNormal">
<span style="font-size: 14.0pt;">First, comes the ingestion of
data from the various data sources. The data will be stored raw in HDFS, known
as the Data Lake (Hortonworks) or the Enterprise Data Hub (Cloudera) and we
will simply project/point a descriptive schema onto it (“schema on read” <a href="https://www.blogger.com/blogger.g?blogID=170648781806274754#http://www.slideshare.net/awadallah/schemaonread-vs-schemaonwrite">concept</a>
(Cloudera), also called “Late binding”
at Hortonworks). Caution: tools like Hive will simply bypass data that doesn’t
match the schema, instead of warning you or stop in case of an error like a
traditional RDBMS or ETL tool would.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt;">Different tools come to mind
for ingestion:<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="font-size: 14.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;">-<span style="font-family: 'Times New Roman'; font-size: 7pt;">
</span></span><!--[endif]--><span style="font-size: 14.0pt;">Use <b>Flume</b> (Flume NG) for event-driven data
(i.e. web logs, say with a use case of a collection of web servers logs to be
aggregated into HDFS for later analysis). Roughly equivalent to using Apache
Kafka with Camus.<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="font-size: 14.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;">-<span style="font-family: 'Times New Roman'; font-size: 7pt;">
</span></span><!--[endif]--><span style="font-size: 14.0pt;">Use <b>Sqoop</b> for RDBMS data, generally via a
JDBC connector. There are also special connectors for Teradata (with the
FastExport utility) and mainframes. <o:p></o:p></span></div>
<div class="MsoListParagraphCxSpLast" style="line-height: 19.0pt; mso-layout-grid-align: none; mso-list: l0 level1 lfo1; mso-pagination: none; text-autospace: none; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #3f3f3f; font-size: 14.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;">-<span style="font-family: 'Times New Roman'; font-size: 7pt; line-height: normal;"> </span></span><!--[endif]--><b><span style="font-size: 14.0pt;">Web HDFS</span></b><span style="font-size: 14.0pt;">: creates
REST-endpoint to move data into Hadoop. Typical use case involves ESB bus.</span><span style="color: #3f3f3f; font-family: "Helvetica Neue"; font-size: 14.0pt; mso-bidi-font-family: "Helvetica Neue";"><o:p></o:p></span></div>
<h1>
Load<o:p></o:p></h1>
<h1>
<span style="color: windowtext; font-family: Cambria; font-size: 14.0pt; font-weight: normal; mso-ascii-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-font-weight: bold; mso-bidi-theme-font: minor-bidi; mso-hansi-theme-font: minor-latin;">A common problem with loading data from RDBMS’s has to do with full load
vs. incremental loads.<o:p></o:p></span></h1>
<h1>
<span style="color: windowtext; font-family: Cambria; font-size: 14.0pt; font-weight: normal; mso-ascii-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-font-weight: bold; mso-bidi-theme-font: minor-bidi; mso-hansi-theme-font: minor-latin;">The easiest case is when we can use a full data load. In the case of an
incremental load however, there are 3 outstanding issues:<o:p></o:p></span></h1>
<div class="MsoNormal">
<br /></div>
<h2>
Seeding<o:p></o:p></h2>
<h1 style="margin-left: .5in; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: windowtext; font-family: Cambria; font-size: 14.0pt; font-weight: normal; mso-bidi-font-family: Cambria; mso-bidi-font-weight: bold; mso-fareast-font-family: Cambria;">-<span style="font-family: 'Times New Roman'; font-size: 7pt;">
</span></span><!--[endif]--><span style="color: windowtext; font-family: Cambria; font-size: 14.0pt; font-weight: normal; mso-ascii-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-font-weight: bold; mso-bidi-theme-font: minor-bidi; mso-hansi-theme-font: minor-latin;">The initial load, or </span><span style="color: windowtext; font-family: Cambria; font-size: 14.0pt; mso-ascii-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi; mso-hansi-theme-font: minor-latin;">seeding of the data</span><span style="color: windowtext; font-family: Cambria; font-size: 14.0pt; font-weight: normal; mso-ascii-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-font-weight: bold; mso-bidi-theme-font: minor-bidi; mso-hansi-theme-font: minor-latin;">. If the initial data volume is big, using Sqoop will overwhelm the RDMBS
by opening too many connections to parallelize the data, especially if that
database is servicing some application that is under some strict SLA’s. Instead
it is better to take an initial data dump from the RDBMS and feed it into Hadoop
into chunks.<o:p></o:p></span></h1>
<h2>
Scheduling<o:p></o:p></h2>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="color: #3f3f3f; font-size: 13.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;">-<span style="font-family: 'Times New Roman'; font-size: 7pt;">
</span></span><!--[endif]--><span style="font-size: 14.0pt;">When do you know
when the <b>scheduled batch is ready</b>? Based
on the delta changes that have happened during that day? Based on a certain
time (e.g. “5am should be enough time for the data to be ready”)? Or, polling
the data source? These are not ideal schemes. A better model is to use the <a href="https://www.blogger.com/blogger.g?blogID=170648781806274754#https://cwiki.apache.org/confluence/display/Hive/HCatalog+Notification">event-driven
mechanism</a> built in to HCatalog to let you know when the data is ready. You
can even chain processing events</span><span style="color: #3f3f3f; font-family: "Helvetica Neue"; font-size: 13.0pt; mso-bidi-font-family: "Helvetica Neue";">!<o:p></o:p></span></div>
<h2>
Appends<span style="color: #3f3f3f; font-family: "Helvetica Neue"; mso-bidi-font-family: "Helvetica Neue";"><o:p></o:p></span></h2>
<div class="MsoListParagraph" style="mso-layout-grid-align: none; mso-list: l0 level1 lfo1; mso-pagination: none; text-autospace: none; text-indent: -.25in;">
<!--[if !supportLists]--><span style="font-size: 14.0pt; mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;">-<span style="font-family: 'Times New Roman'; font-size: 7pt;">
</span></span><!--[endif]--><b><span style="font-size: 14.0pt;">Appending data in HDFS</span></b><span style="color: #3f3f3f; font-family: "Helvetica Neue"; font-size: 13.0pt; mso-bidi-font-family: "Helvetica Neue";">. </span><span style="font-size: 14.0pt;">As live data gets
updated, how to reflect this in HDFS which holds static data by definition?
Firstly, Hive supports appends, via dynamic partitions: INSERT INTO will append
to the table or partition keeping the existing data intact. If a partition
column value is given, we call this a static partition; otherwise it is a
dynamic partition, driven by the corresponding input column from the select
statement (the value of the input column). Secondly, updates can be supported
in Hive by utilizing HBase along with it: Periodic loads will come from HBase,
continuously being updated; Hive queries will sit on top of them. E.g., create your Hive table like this:<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: Courier; mso-bidi-font-family: Courier;"> create table ...<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: Courier; mso-bidi-font-family: Courier;"> STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: Courier; mso-bidi-font-family: Courier;"> WITH SERDEPROPERTIES
("hbase.columns.mapping" = ....);</span><span style="font-size: 14.0pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt;">Using INSERT OVERWRITE
statements will update the rows for a given row key (given that the row keys
are unique). <o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<h2>
Orchestration<o:p></o:p></h2>
<div class="MsoNormal">
<span style="font-size: 14.0pt;">Orchestration today is done
typically with Oozie; but Oozie is very XML verbose; its replacement (mainly
supported by Hortonworks) is Falcon.</span><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h1>
Transform<span style="font-size: 14.0pt;"><o:p></o:p></span></h1>
<div class="MsoNormal">
<br /></div>
<h2>
Tool<o:p></o:p></h2>
<div class="MsoNormal">
<span style="font-size: 14.0pt;">Transformation could be done
in Hive or Pig – Pig is a little more flexible and convenient to use in a data
pipeline, as it is a procedural language, as opposed to Hive which is more
declarative. So Pig will let you do checkpoints on the data, which is
convenient in case of failures. And secondly it is a little more convenient to
integrate standard code with Pig than it is with Hive. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt;">Or course there are also
tools like Cascading, or Scalding for programming language frameworks.</span><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Data format<o:p></o:p></h2>
<div class="MsoNormal">
<span style="font-size: 14.0pt;">Sometimes it makes more sense
to use a different format that the original one (usually Text/csv or Json, or
XML). For example for XML, an Apache Avro container is typically used as it is
schema rich, and supports schema evolution. And ORC and Parquet are the trend
these days, as columnar storage is the most efficient in terms of storage and
efficiency. From testing, Parquet seems a bit faster, but ORC has statistics in
it, and they are both portable across tools via HCatalog, so portability
shouldn’t be an issue. However Parquet was also designed from the ground up to
be language-independent.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<!--EndFragment-->Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com0tag:blogger.com,1999:blog-170648781806274754.post-86197553538917357312014-03-31T14:29:00.000-07:002014-03-31T14:30:17.080-07:00How to set up Apache SolrCloud<div class="MsoNormal">
<br /></div>
<div style="border-bottom: solid #4F81BD 1.0pt; border: none; mso-border-bottom-themecolor: accent1; mso-element: para-border-div; padding: 0in 0in 4.0pt 0in;">
<div class="MsoTitle">
<h2>
How to set up SolrCloud</h2>
<o:p></o:p></div>
</div>
<div class="MsoNormal">
<br /></div>
<h1>
Definitions<o:p></o:p></h1>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
SolrCloud is utilized to scale out Apache Solr onto multiple
machines; we can set up a Collection (A single search index, logically grouped)
on multiple shards (A logical section of a single collection) that each serve
requests for scalability purposes. This is done by splitting the index into
multiple cores (physical indexes), residing on multiple physical nodes, forming
a cluster. If requests velocity increases, we can set multiple copies of the
core on each of the node, called replicas (the original core is called the
leader). Of note is the fact that coordination is handled by Zookeeper, a 3<sup>rd</sup>
party library, as opposed to using an internal communication protocol like in
Gossip in Apache Cassandra.<o:p></o:p></div>
<div class="MsoNormal">
So scaling out in SolrCloud is done by sharding, i.e. adding
more nodes that have multiple cores of the collection, including replicas.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Shard:<o:p></o:p></div>
<div class="MsoNormal">
A logical section of a single collection. Sometimes people
will talk about "Shard" in a physical sense (a manifestation of a
logical shard)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Replica:<o:p></o:p></div>
<div class="MsoNormal">
A physical manifestation of a logical Shard, implemented as
a single Lucene index on a SolrCore<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Leader:<o:p></o:p></div>
<div class="MsoNormal">
One Replica of every Shard will be designated as a Leader to
coordinate indexing for that Shard<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
SolrCore:<o:p></o:p></div>
<div class="MsoNormal">
Encapsulates a single physical index. One or more make up
logical shards (or slices) which make up a collection.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Node:<o:p></o:p></div>
<div class="MsoNormal">
A single instance of Solr. A single Solr instance can have
multiple SolrCores that can be part of any number of collections.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Cluster:<o:p></o:p></div>
<div class="MsoNormal">
All of the nodes you are using to host SolrCores.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h1>
Script to run SolrCloud:<o:p></o:p></h1>
<div class="MsoNormal">
<br /></div>
<h2>
Create configuration folders.<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
A Solr instance is comprised of:<o:p></o:p></div>
<div class="MsoNormal">
a conf file, that contains the collections configurations to
be indexed.<o:p></o:p></div>
<div class="MsoNormal">
i.e. solr->conf->collection1.<o:p></o:p></div>
<div class="MsoNormal">
the collection directory contains some simple configuration
files:<o:p></o:p></div>
<div class="MsoNormal">
collection1->conf<o:p></o:p></div>
<div class="MsoNormal">
A good way to start is to copy the
solr-xx/example/solr/collection1 directory.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Then change the name of the collection inside of this
directory, in core.properties.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Also, a Solr instance contains a Solrhome directory, that
contains solr.xml and zoo.cfg. SolrHome represents a node, and will contain the
index data for that node. A node can be on a different machine. Ours will be
solr1.<o:p></o:p></div>
<div class="MsoNormal">
solr.xml and zoo.cfg can be copied from the original Solr-xx
directory, under example/solr.<o:p></o:p></div>
<div class="MsoNormal">
These files contain parameters that may need to be changed,
like hostname and port numbers.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Install Zookeeper<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Download Zookeeper and install.<o:p></o:p></div>
<div class="MsoNormal">
Create data directory, and configure accordingly in
zookeeper-xx/conf/zoo.cfg.<o:p></o:p></div>
<div class="MsoNormal">
Also change the port number if needed , default: 2181.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Run ./zkServer.sh start <o:p></o:p></div>
<div class="MsoNormal">
also ./zkServer.sh status.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Upload configuration into zookeeper<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
First, run Solr by itself; this is required to bootstrap
properly.<o:p></o:p></div>
<div class="MsoNormal">
<span style="background: white; color: black; font-family: Consolas; font-size: 10.5pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";">java -jar start.jar</span><span style="font-family: Times; font-size: 10.0pt; mso-bidi-font-family: "Times New Roman"; mso-fareast-font-family: "Times New Roman";"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
There is a script to run and automate the loading of our
collection into zookeeper, where you pass the zookeeper information, the
directory for the collection and its name, i.e.:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-spacerun: yes;"> </span>cloud-scripts/zkcli.sh -cmd upconfig -zkhost
localhost:2181 -confdir /Users/mlieber/app/solr/conf/testcollection3/conf
-confname testcollection3<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-spacerun: yes;"> </span>cloud-scripts/zkcli.sh -cmd upconfig -zkhost
localhost:9983 -confdir /Users/mlieber/app/solr/conf/collection1/conf -confname
testcollection<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This is a one-time task.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Run of the Solr node<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This is done via the start.jar program found in
solr-xx/example. We pass in either the Solr-embedded zookeeper :<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
java -DzkRun
-Dsolr.solr.home=/Users/mlieber/app/solr/solrhome/ -jar start.jar<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
or in Production, our own zookeeper instance:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
java -Dsolr.solr.home=/Users/mlieber/app/solr/solrhome1/
-DzkHost=localhost:2181 -jar start.jar<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The solr.solr.home is the directory that was created for
that node.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
If testing this on a single machine with multiple nodes, you
may need to change the jetty port for the 2nd node, and reflect this in the
command:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-spacerun: yes;"> </span>java
-Dsolr.solr.home=/Users/mlieber/app/solr/solrhome1/ -DzkHost=localhost:2181
-Djetty.port=8984 -jar start.jar<o:p></o:p></div>
<div class="MsoNormal">
The jetty.port can also optionally be changed in the node
configuration folder, at solrhome1/solr.xml.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h2>
Create API to create the collection<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Next, we can then create our collection via the Solr API,
via a REST call. I.e.:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
curl
'http://localhost:8983/solr/admin/collections?action=CREATE&name=testcollection3&numShards=2&maxShardsPerNode=3&replicationFactor=3'<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
http://localhost:8983/solr/admin/collections?action=CREATE&name=testcollection&numShards=2&maxShardsPerNode=2&replicationFactor=1<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
We need to pass the name of the collection being created,
the # of shards, RF and max # of shards per node. You 'll get a useful error if
it's not working. E.g. passing 2 shards and RF=2 on a single node, you will
need a max of 4.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Example:<o:p></o:p></div>
<div class="MsoNormal">
Create a testcollection which has 2 shards , replication
factor 2, running on 2 JVMs<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
curl
'http://localhost:8983/solr/admin/collections?action=CREATE&name=testcollection&numShards=2&maxShardsPerNode=2&replicationFactor=2'<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
You can then add a document to this collection via :<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">java -Durl=http://localhost:8983/solr/collection1/update -jar
./example/exampledocs/post.jar ./example/exampledocs/monitor.xml</span><o:p></o:p></div>
<h2>
Add a replica for an existing node<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
You can add a replica after the initial creation, on each
shard. The syntax is simply to add the new shardname. E.g:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
curl
'http://localhost:8984/solr/admin/cores?action=CREATE&collection=testcollection3&shard=shard1&name=testcollection3_shard1_replica4'<o:p></o:p></div>
<div class="MsoNormal">
curl 'http://localhost:8983/solr/admin/cores?action=CREATE&collection=testcollection3&shard=shard1&name=testcollection3_shard1_replica5'<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Administration<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
You can view and test the configuration from the admin UI,
under Cloud/Tree, clusterstate.json. <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
<span style="mso-spacerun: yes;"> </span>Set up Solr on TomCat.<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
By default Solr is bundled with Jetty as the web server.
TomCat is considered more robust as a Servlet container, therefore sometimes it
is preferable to switch Solr over to TomCat.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
- Copy Solr’s solr.war (usually in $SOLR_HOME<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">/example/webapps/solr.war)
</span>to <$TOMCAT_HOME >/webapps<span style="mso-spacerun: yes;">
</span>to make TomCat aware of Solr.<o:p></o:p></div>
<div class="MsoNormal">
- Add the below to TomCat, in file
'conf/Catalina/localhost/solr.xml', referring to the location of solr.war you
copied, as well as your SolrCloud node location.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;"><Context path="/solr"
docBase="/app/apache-tomcat-7.0.29/webapps/solr.war"
debug="0" crossContext="true"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;"> <Environment name="solr/home"
type="java.lang.String" value="/app/solrnode1"
override="true"/><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;"> </Context><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
- I was told to also add this for precaution measures, in
conf/server.xml:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">vi conf/server.xml - Add the following <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;"> <Connector port="8080"
protocol="HTTP/1.1"<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">
connectionTimeout="20000"<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">
redirectPort="8443"<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;"> <b style="mso-bidi-font-weight: normal;">URIEncoding="UTF-8" </b>/><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
- cp $SOLR_HOME/example/lib/ext/* $TOMCAT_HOME/lib/<o:p></o:p></div>
<div class="MsoNormal">
- cp $SOLR_HOME/resources/log4j.properties $TOMCAT_HOME/lib/<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
- Edit catalina.sh and add these to be context-aware:<o:p></o:p></div>
<div class="MsoNormal">
<span style="color: #032553; font-family: Verdana; font-size: 13.0pt; mso-bidi-font-family: Verdana;"> </span>SOLR_OPTS="-Dhost=localhost
-DhostPort=8080 -DhostContext=solr -DzkClientTimeout=20000
-DzkHost=localhost:2181"<o:p></o:p></div>
<div class="MsoNormal">
- JAVA_OPTS="$JAVA_OPTS $SOLR_OPTS"<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
- Change 'jetty.port' to 'hostPort' in solr.xml<o:p></o:p></div>
<div class="MsoNormal">
- Start TomCat<o:p></o:p></div>
<div class="MsoNormal">
<span style="color: black; font-family: "Menlo Regular"; font-size: 11.0pt;">./catalina.sh start<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Take a look at the TomCat logs to make sure everything is
ok, in catalina.out in $TOMCAT_HOME/logs.<o:p></o:p></div>
<div class="MsoNormal">
To view your Solr cores, go to
http://{your-ip-address}:8080/solr<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>1079</o:Words>
<o:Characters>6151</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>51</o:Lines>
<o:Paragraphs>14</o:Paragraphs>
<o:CharactersWithSpaces>7216</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
<br /></div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com0tag:blogger.com,1999:blog-170648781806274754.post-24411347032278932212014-03-10T15:00:00.002-07:002014-03-10T15:01:57.228-07:00Hadoop data formats, Parquet, and Impala evaluation<iframe src="http://www.slideshare.net/slideshow/embed_code/32145319" width="427" height="356" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px 1px 0; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe> <div style="margin-bottom:5px"> <strong> <a href="https://www.slideshare.net/mattlieber/parquet-and-impala-overview-external" title="Parquet and impala overview external" target="_blank">Parquet and impala overview external</a> </strong> from <strong><a href="http://www.slideshare.net/mattlieber" target="_blank">mattlieber</a></strong> </div>Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com1tag:blogger.com,1999:blog-170648781806274754.post-30345139176797027272014-02-24T17:22:00.003-08:002014-02-24T19:38:41.967-08:00Cassandra modeling with CQL for a simple time series use case<b><span style="font-size: x-large;">Simple Cassandra modeling example</span></b><br />
<div class="MsoNormal" style="text-align: justify;">
<v:shapetype coordsize="21600,21600" id="_x0000_t202" o:spt="202" path="m0,0l0,21600,21600,21600,21600,0xe"><b><span style="font-size: x-large;">
<v:stroke joinstyle="miter">
<v:path gradientshapeok="t" o:connecttype="rect">
</v:path></v:stroke></span></b></v:shapetype><v:shape fillcolor="#d8d8d8" filled="f" id="Text_x0020_Box_x0020_4" o:allowincell="f" o:gfxdata="UEsDBBQABgAIAAAAIQApm/tGBAEAAB4CAAATAAAAW0NvbnRlbnRfVHlwZXNdLnhtbKSRzU7DMBCE
70i8g+UrSpxyQAgl6YGfI3AoD7DYm8TCsS2vW9q3Z5OmF1RVSFws2+uZ+TSu1/vRiR0mssE3clVW
UqDXwVjfN/Jj81LcS0EZvAEXPDbygCTX7fVVvTlEJMFqT40cco4PSpEecAQqQ0TPky6kETIfU68i
6C/oUd1W1Z3SwWf0uciTh2zrJ+xg67J43vP1kYTlUjwe301RjYQYndWQGVRNU3VWl9DRBeHOm190
xUJWsnI2p8FGulkS3riaZA2Kd0j5FUbmUNrZ+BkgGWUSfHNRdNqsysvYZ9JD11mNJujtyI2Ui+Pf
4jO3jWpe/58825xy1fy77Q8AAAD//wMAUEsDBBQABgAIAAAAIQCtMD/xwQAAADIBAAALAAAAX3Jl
bHMvLnJlbHOEj80KwjAQhO+C7xD2btN6EJGmvYjgVfQB1mTbBtskZOPf25uLoCB4m2XYb2bq9jGN
4kaRrXcKqqIEQU57Y12v4HTcLdYgOKEzOHpHCp7E0DbzWX2gEVN+4sEGFpniWMGQUthIyXqgCbnw
gVx2Oh8nTPmMvQyoL9iTXJblSsZPBjRfTLE3CuLeVCCOz5CT/7N911lNW6+vE7n0I0KaiPe8LCMx
9pQU6NGGs8do3ha/RVXk5iCbWn4tbV4AAAD//wMAUEsDBBQABgAIAAAAIQDwjn7quwMAADEJAAAf
AAAAY2xpcGJvYXJkL2RyYXdpbmdzL2RyYXdpbmcxLnhtbLxWW4/jJhR+r9T/gHj32E5wYkfrWSVO
sqo0ux0ls31nMI6t2uACue2q/70H7Ew8l3alTlVZSg5w+M53bsCHj6emRgeudCVFisObACMumMwr
sUvx14e1F2OkDRU5raXgKT5zjT/e/vzTBzrbKdqWFUOAIPSMprg0pp35vmYlb6i+kS0XsFZI1VAD
Q7Xzc0WPgNzU/igIJn5DK4Fvr1BLaijaq+pfQNWS/c7zjIoD1QBZs9lwpudYs/cj05k4fFLttr1X
ljn7crhXqMpTDJETtIEQYb9f6NVg6L/YtbsCnArVWH1ZFOiUYhIEQRQA1jnFHhmNIxJ1ePxkEAOF
MYkm4yDCiIEGSUKSBL3B8tcfQLBy9c8gQLOjA8KAom4tQXF47TO5+Pxg6S3kCZEn7602MieYhLpy
WdbtHWRJIyGzkoodnysljyWnubYaXZggnp0ZF7KLRW2xHo+fZQ7xpXsjHd5/FLonr+msVdp84rJB
Vkix4sw4S/Rwp01H8KLiQiLXVV27dNfi2QRgdjO8KACk201nECUQraaNl6v070mQrOJVTCDbk5VH
gjz35uuMeJN1OI2W42WWLcM/LYuQzMoqz7mwRi9dF5JXJd1UTEktC3PDZONDXVWMXzoP+i4Mrn2n
ZV3lFs5S0mr3mNUKHWid4mVsvz4pAzX/OQ1X2uDLC5fCEQkWo8RbT+KpRwoSeck0iL0gTBbJJCAJ
Wa6fu3RXCf5+l9AR+iMOoX2cP3/r3Dyy32vn6KypDFeorpoUx9CJgOSya4t0JXInG1rVnTyIheX/
dizm6yiYknHsTafR2CNjHniLeJ158yycTKarRbZYvUjvypWMfn84XFIG9Tfg29u4UoaCvRQniLbf
7G5z2rret22cn+3MI/xDLyoJ7QGnFNwcIJRSfcPoCPdBivUfe6o4RvUvAtqaRNMRqJnhQA0Hj8MB
FQygUmww6sTMwAj271tV7Uqw1B0kQs7hDCiqviU7TpZdrc3WnGvuEuWY24OjoerOwYCwsYJV1S1b
8KKX7o3uyt7l2wVgsDovXH3Djrf03Cps6UPm4qbAag1HXIq58L5u4Qr9luIR3Hk/KMxlYr+3CvP5
OVJTUwlkzi0vKIMj8TeucipoV6ycDlYeqoZr9IUf0UY2VHQaTA80PgNPWvc2gbgNibnd8IIreAlw
NFeshK5gZq+4vcggFvDr1LjI76mim7e8TXpn/yfiVyqOHuTeUn0q273m23YDTkBluAVb1yDYq81/
8VhwC/3jxr5IhuPbvwAAAP//AwBQSwMEFAAGAAgAAAAhAJxOXiHiBgAAOhwAABoAAABjbGlwYm9h
cmQvdGhlbWUvdGhlbWUxLnhtbOxZT28bRRS/I/EdRntv4/+NozpV7NgNtGmj2C3qcbwe704zu7Oa
GSf1DbVHJCREQRyoxI0DAiq1EpfyaQJFUKR+Bd7M7K534jVJ2wgqaA7x7tvfvP/vzZvdy1fuRQwd
EiEpjzte9WLFQyT2+YTGQce7NRpcWPeQVDieYMZj0vHmRHpXNt9/7zLe8BlNxhyLySgkEUHAKJYb
uOOFSiUba2vSBzKWF3lCYng25SLCCm5FsDYR+AgERGytVqm01iJMY28TOCrNqM/gX6ykJvhMDDUb
gmIcgfSb0yn1icFODqoaIeeyxwQ6xKzjAc8JPxqRe8pDDEsFDzpexfx5a5uX1/BGuoipFWsL6wbm
L12XLpgc1IxMEYxzodVBo31pO+dvAEwt4/r9fq9fzfkZAPZ9sNTqUuTZGKxXuxnPAsheLvPuVZqV
hosv8K8v6dzudrvNdqqLZWpA9rKxhF+vtBpbNQdvQBbfXMI3ulu9XsvBG5DFt5bwg0vtVsPFG1DI
aHywhNYBHQxS7jlkytlOKXwd4OuVFL5AQTbk2aVFTHmsVuVahO9yMQCABjKsaIzUPCFT7ENO9nA0
FhRrAXiD4MITS/LlEknLQtIXNFEd78MEx14B8vLZ9y+fPUHH958e3//p+MGD4/s/WkbOqh0cB8VV
L7797M9HH6M/nnzz4uEX5XhZxP/6wye//Px5ORDKZ2He8y8f//b08fOvPv39u4cl8C2Bx0X4iEZE
ohvkCO3zCAwzXnE1J2PxaitGIabFFVtxIHGMtZQS/n0VOugbc8zS6Dh6dInrwdsC2kcZ8OrsrqPw
MBQzRUskXwsjB7jLOetyUeqFa1pWwc2jWRyUCxezIm4f48My2T0cO/HtzxLom1laOob3QuKoucdw
rHBAYqKQfsYPCCmx7g6ljl93qS+45FOF7lDUxbTUJSM6drJpsWiHRhCXeZnNEG/HN7u3UZezMqu3
yaGLhKrArET5EWGOG6/imcJRGcsRjljR4dexCsuUHM6FX8T1pYJIB4Rx1J8QKcvW3BRgbyHo1zB0
rNKw77J55CKFogdlPK9jzovIbX7QC3GUlGGHNA6L2A/kAaQoRntclcF3uVsh+h7igOOV4b5NiRPu
07vBLRo4Ki0SRD+ZiZJYXiXcyd/hnE0xMa0GmrrTqyMa/13jZhQ6t5Vwfo0bWuXzrx+V6P22tuwt
2L3KambnRKNehTvZnntcTOjb35238SzeI1AQy1vUu+b8rjl7//nmvKqez78lL7owNGg9i9hB24zd
0cqpe0oZG6o5I9elGbwl7D2TARD1OnO6JPkpLAnhUlcyCHBwgcBmDRJcfURVOAxxAkN71dNMApmy
DiRKuITDoiGX8tZ4GPyVPWo29SHEdg6J1S6fWHJdk7OzRs7GaBWYA20mqK4ZnFVY/VLKFGx7HWFV
rdSZpVWNaqYpOtJyk7WLzaEcXJ6bBsTcmzDUIBiFwMstON9r0XDYwYxMtN9tjLKwmCicZ4hkiCck
jZG2ezlGVROkLFeWDNF22GTQB8dTvFaQ1tZs30DaWYJUFNdYIS6L3ptEKcvgRZSA28lyZHGxOFmM
jjpeu1lresjHScebwjkZLqMEoi71HIlZAG+YfCVs2p9azKbKF9FsZ4a5RVCFVx/W70sGO30gEVJt
Yxna1DCP0hRgsZZk9a81wa3nZUBJNzqbFvV1SIZ/TQvwoxtaMp0SXxWDXaBo39nbtJXymSJiGE6O
0JjNxD6G8OtUBXsmVMLrDtMR9A28m9PeNo/c5pwWXfGNmMFZOmZJiNN2q0s0q2QLNw0p18HcFdQD
20p1N8a9uimm5M/JlGIa/89M0fsJvH2oT3QEfHjRKzDSldLxuFAhhy6UhNQfCBgcTO+AbIH3u/AY
kgreSptfQQ71r605y8OUNRwi1T4NkKCwH6lQELIHbclk3ynMquneZVmylJHJqIK6MrFqj8khYSPd
A1t6b/dQCKluuknaBgzuZP6592kFjQM95BTrzelk+d5ra+CfnnxsMYNRbh82A03m/1zFfDxY7Kp2
vVme7b1FQ/SDxZjVyKoChBW2gnZa9q+pwitutbZjLVlca2bKQRSXLQZiPhAl8A4J6X+w/1HhM/sF
Q2+oI74PvRXBxwvNDNIGsvqCHTyQbpCWOIbByRJtMmlW1rXp6KS9lm3W5zzp5nJPOFtrdpZ4v6Kz
8+HMFefU4nk6O/Ww42tLW+lqiOzJEgXSNDvImMCUfcnaxQkaB9WOB1+TIND34Aq+R3lAq2laTdPg
Cj4ywbBkvwx1vPQio8BzS8kx9YxSzzCNjNLIKM2MAsNZ+g0mo7SgU+nPJvDZTv94KPtCAhNc+kUl
a6rO577NvwAAAP//AwBQSwMEFAAGAAgAAAAhAJxmRkG7AAAAJAEAACoAAABjbGlwYm9hcmQvZHJh
d2luZ3MvX3JlbHMvZHJhd2luZzEueG1sLnJlbHOEj80KwjAQhO+C7xD2btJ6EJEmvYjQq9QHCMk2
LTY/JFHs2xvoRUHwsjCz7DezTfuyM3liTJN3HGpaAUGnvJ6c4XDrL7sjkJSl03L2DjksmKAV201z
xVnmcpTGKSRSKC5xGHMOJ8aSGtHKRH1AVzaDj1bmIqNhQaq7NMj2VXVg8ZMB4otJOs0hdroG0i+h
JP9n+2GYFJ69elh0+UcEy6UXFqCMBjMHSldnnTUtXYGJhn39Jt4AAAD//wMAUEsBAi0AFAAGAAgA
AAAhACmb+0YEAQAAHgIAABMAAAAAAAAAAAAAAAAAAAAAAFtDb250ZW50X1R5cGVzXS54bWxQSwEC
LQAUAAYACAAAACEArTA/8cEAAAAyAQAACwAAAAAAAAAAAAAAAAA1AQAAX3JlbHMvLnJlbHNQSwEC
LQAUAAYACAAAACEA8I5+6rsDAAAxCQAAHwAAAAAAAAAAAAAAAAAfAgAAY2xpcGJvYXJkL2RyYXdp
bmdzL2RyYXdpbmcxLnhtbFBLAQItABQABgAIAAAAIQCcTl4h4gYAADocAAAaAAAAAAAAAAAAAAAA
ABcGAABjbGlwYm9hcmQvdGhlbWUvdGhlbWUxLnhtbFBLAQItABQABgAIAAAAIQCcZkZBuwAAACQB
AAAqAAAAAAAAAAAAAAAAADENAABjbGlwYm9hcmQvZHJhd2luZ3MvX3JlbHMvZHJhd2luZzEueG1s
LnJlbHNQSwUGAAAAAAUABQBnAQAANA4AAAAA
" o:spid="_x0000_s1026" strokecolor="#a5a5a5" stroked="f" strokeweight="3pt" style="height: 38.7pt; left: 0; margin-left: 279pt; margin-top: -42.3pt; mso-height-percent: 0; mso-height-percent: 0; mso-height-relative: margin; mso-position-horizontal-relative: margin; mso-position-horizontal: absolute; mso-position-vertical-relative: margin; mso-position-vertical: absolute; mso-width-percent: 0; mso-width-percent: 0; mso-width-relative: margin; mso-wrap-distance-bottom: 0; mso-wrap-distance-left: 9pt; mso-wrap-distance-right: 9pt; mso-wrap-distance-top: 0; mso-wrap-style: square; position: absolute; text-align: left; v-text-anchor: top; visibility: visible; width: 272.15pt; z-index: 251658242;" type="#_x0000_t202">
<v:textbox inset="3.6pt,,3.6pt">
<!--[if !mso]-->
<!--[endif]--></v:textbox>
<w:wrap anchorx="margin" anchory="margin">
</w:wrap></v:shape><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc326929093"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc326928980"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc326854983"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc326843876"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc326843819"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc326834884"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc326766695"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc326766646"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc326756710"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc326666089"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc326665498"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc326665367"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc315044304"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc314840952"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc301313655"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc301280309"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc301266654"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc301266546"><span style="mso-bookmark: _Toc301266654;"><span style="mso-bookmark: _Toc301280309;"><span style="mso-bookmark: _Toc301313655;"><span style="mso-bookmark: _Toc314840952;"><span style="mso-bookmark: _Toc315044304;"><span style="mso-bookmark: _Toc326665367;"><span style="mso-bookmark: _Toc326665498;"><span style="mso-bookmark: _Toc326666089;"><span style="mso-bookmark: _Toc326756710;"><span style="mso-bookmark: _Toc326766646;"><span style="mso-bookmark: _Toc326766695;"><span style="mso-bookmark: _Toc326834884;"><span style="mso-bookmark: _Toc326843819;"><span style="mso-bookmark: _Toc326843876;"><span style="mso-bookmark: _Toc326854983;"><span style="mso-bookmark: _Toc326928980;"><span style="mso-bookmark: _Toc326929093;"></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></a><span style="mso-bookmark: _Toc301266546;"><span style="mso-bookmark: _Toc301266654;"><span style="mso-bookmark: _Toc301280309;"><span style="mso-bookmark: _Toc301313655;"><span style="mso-bookmark: _Toc314840952;"><span style="mso-bookmark: _Toc315044304;"><span style="mso-bookmark: _Toc326665367;"><span style="mso-bookmark: _Toc326665498;"><span style="mso-bookmark: _Toc326666089;"><span style="mso-bookmark: _Toc326756710;"><span style="mso-bookmark: _Toc326766646;"><span style="mso-bookmark: _Toc326766695;"><span style="mso-bookmark: _Toc326834884;"><span style="mso-bookmark: _Toc326843819;"><span style="mso-bookmark: _Toc326843876;"><span style="mso-bookmark: _Toc326854983;"><span style="mso-bookmark: _Toc326928980;"><span style="mso-bookmark: _Toc326929093;"><span style="font-family: Arial; font-size: 10.0pt; mso-bidi-font-size: 12.0pt;"><span style="mso-tab-count: 1;"> </span><o:p></o:p></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div class="MsoNormal" style="text-align: justify;">
<br /></div>
<div class="MsoNormal" style="text-align: justify;">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h1 style="mso-list: l1 level1 lfo1;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254868536"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254618947"><span style="mso-bookmark: _Toc254868536;"><!--[if !supportLists]--><span style="mso-bidi-font-family: Arial; mso-fareast-font-family: Arial;"><span style="mso-list: Ignore;">1<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="mso-bidi-font-family: Arial;">Introduction</span></span></a></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bidi-font-family: Arial;"><o:p></o:p></span></span></span></span></h1>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="font-family: Arial; font-size: 12.0pt;">The document explores different
designs for a simple real-world time series use case on Apache Cassandra. Initially this data resided in a different database (SQL Server, Rdbms), but given the volume of the data, we need a more scalable solution. Cassandra was chosen for its scalability, no SPOF feature, and ease of use. We will review our options regarding the data model, using CQL to construct the table.<o:p></o:p></span></span></span></span></div>
<div class="MsoNormal">
<br /></div>
<h2 style="mso-list: l1 level2 lfo1;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254868537"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254618948"><span style="mso-bookmark: _Toc254868537;"><!--[if !supportLists]--><span style="font-style: normal; mso-bidi-font-family: Arial; mso-bidi-font-style: italic; mso-fareast-font-family: Arial;"><span style="mso-list: Ignore;">1.1<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="font-style: normal; mso-bidi-font-family: Arial; mso-bidi-font-style: italic;">Scope and requirements</span></span></a></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="font-style: normal; mso-bidi-font-family: Arial; mso-bidi-font-style: italic;"><o:p></o:p></span></span></span></span></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="font-family: Arial; font-size: 12.0pt;">Devices need to communicate over the
Web (devices are wifi-enabled) to their backend cloud infrastructure (IaaS and
PaaS) to send status and retrieve updates. There will be about 100k devices
connected, and it is important that the architecture be scalable when more
devices are added. The rate fluctuates with spikes. Data format is XML.<o:p></o:p></span></span></span></span></div>
<div class="MsoNormal">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="font-family: Arial; font-size: 12.0pt;">Need to find the optimal schema for
this data, accounting for fast writes and fast retrieval, minimizing the amount
of space utilized. </span></span></span></span><br />
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="font-family: Arial; font-size: 12.0pt;">Cassandra is a good choice for storing this data, but how to proceed with the data schema?<o:p></o:p></span></span></span></span></div>
<div class="MsoNormal">
<br /></div>
<h2 style="mso-list: l1 level2 lfo1;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254868538"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254618949"><span style="mso-bookmark: _Toc254868538;"><!--[if !supportLists]--><span style="font-size: 12.0pt; font-style: normal; mso-bidi-font-family: Arial; mso-bidi-font-size: 11.5pt; mso-bidi-font-style: italic; mso-fareast-font-family: Arial;"><span style="mso-list: Ignore;">1.2<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="font-style: normal; mso-bidi-font-style: italic;">Data schema & queries</span></span></a></span></span></span><span style="mso-bookmark: _Toc254868538;"></span><span style="mso-bookmark: _Toc254618949;"></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="font-size: 12.0pt; font-style: normal; mso-bidi-font-size: 11.5pt; mso-bidi-font-style: italic;"><o:p></o:p></span></span></span></span></h2>
<div class="MsoNormal">
The data that we want to store in Cassandra are of the format below.<br />
<br /></div>
<div class="MsoListParagraph" style="mso-layout-grid-align: none; mso-list: l0 level1 lfo2; mso-pagination: none; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none; text-indent: -.25in;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><!--[if !supportLists]--><span style="font-family: Arial; font-size: 12.0pt; mso-fareast-font-family: Arial;"><span style="mso-list: Ignore;">1.<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><b style="mso-bidi-font-weight: normal;"><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-language: AR-SA;">Sample data
format</span></b></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="font-family: Arial; font-size: 12.0pt;"><o:p></o:p></span></span></span></span></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<br /></div>
<table border="1" cellpadding="0" cellspacing="0" class="MsoTableLightListAccent5" style="border-collapse: collapse; border: none; mso-border-alt: solid #4BACC6 1.0pt; mso-border-insideh-themecolor: accent5; mso-border-insideh: 1.0pt solid #4BACC6; mso-border-insidev-themecolor: accent5; mso-border-insidev: 1.0pt solid #4BACC6; mso-border-themecolor: accent5; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 1184;">
<tbody>
<tr style="mso-yfti-firstrow: yes; mso-yfti-irow: -1;">
<td style="background: #4BACC6; border: solid #4BACC6 1.0pt; mso-background-themecolor: accent5; mso-border-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; mso-yfti-cnfc: 5; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><b><span style="color: white; font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-themecolor: background1;">Column
name<o:p></o:p></span></b></span></span></span></div>
</td>
<td style="background: #4BACC6; border-left: none; border: solid #4BACC6 1.0pt; mso-background-themecolor: accent5; mso-border-left-alt: solid #4BACC6 1.0pt; mso-border-left-themecolor: accent5; mso-border-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; mso-yfti-cnfc: 1; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><b><span style="color: white; font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-themecolor: background1;">Sample
value<o:p></o:p></span></b></span></span></span></div>
</td>
<td style="background: #4BACC6; border-left: none; border: solid #4BACC6 1.0pt; mso-background-themecolor: accent5; mso-border-left-alt: solid #4BACC6 1.0pt; mso-border-left-themecolor: accent5; mso-border-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; mso-yfti-cnfc: 1; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><b><span style="color: white; font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-themecolor: background1;">Comment<o:p></o:p></span></b></span></span></span></div>
</td>
</tr>
<tr style="mso-yfti-irow: 0;">
<td style="border-top: none; border: solid #4BACC6 1.0pt; mso-border-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; mso-yfti-cnfc: 68; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><b><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin;">Date<o:p></o:p></span></b></span></span></span></div>
</td>
<td style="border-bottom: solid #4BACC6 1.0pt; border-left: none; border-right: solid #4BACC6 1.0pt; border-top: none; mso-border-bottom-themecolor: accent5; mso-border-left-alt: solid #4BACC6 1.0pt; mso-border-left-themecolor: accent5; mso-border-right-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; mso-yfti-cnfc: 64; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin;">02/05/2013<o:p></o:p></span></span></span></span></div>
</td>
<td style="border-bottom: solid #4BACC6 1.0pt; border-left: none; border-right: solid #4BACC6 1.0pt; border-top: none; mso-border-bottom-themecolor: accent5; mso-border-left-alt: solid #4BACC6 1.0pt; mso-border-left-themecolor: accent5; mso-border-right-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; mso-yfti-cnfc: 64; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin;">Day's worth of data<o:p></o:p></span></span></span></span></div>
</td>
</tr>
<tr style="mso-yfti-irow: 1;">
<td style="border-top: none; border: solid #4BACC6 1.0pt; mso-border-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; mso-yfti-cnfc: 4; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><b><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin;">Device_id<o:p></o:p></span></b></span></span></span></div>
</td>
<td style="border-bottom: solid #4BACC6 1.0pt; border-left: none; border-right: solid #4BACC6 1.0pt; border-top: none; mso-border-bottom-themecolor: accent5; mso-border-left-alt: solid #4BACC6 1.0pt; mso-border-left-themecolor: accent5; mso-border-right-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin;">As-333-fd-45<o:p></o:p></span></span></span></span></div>
</td>
<td style="border-bottom: solid #4BACC6 1.0pt; border-left: none; border-right: solid #4BACC6 1.0pt; border-top: none; mso-border-bottom-themecolor: accent5; mso-border-left-alt: solid #4BACC6 1.0pt; mso-border-left-themecolor: accent5; mso-border-right-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin;">unique id for device<o:p></o:p></span></span></span></span></div>
</td>
</tr>
<tr style="mso-yfti-irow: 2;">
<td style="border-top: none; border: solid #4BACC6 1.0pt; mso-border-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; mso-yfti-cnfc: 68; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><b><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin;">timestamp<o:p></o:p></span></b></span></span></span></div>
</td>
<td style="border-bottom: solid #4BACC6 1.0pt; border-left: none; border-right: solid #4BACC6 1.0pt; border-top: none; mso-border-bottom-themecolor: accent5; mso-border-left-alt: solid #4BACC6 1.0pt; mso-border-left-themecolor: accent5; mso-border-right-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; mso-yfti-cnfc: 64; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin;">02:44:45<o:p></o:p></span></span></span></span></div>
</td>
<td style="border-bottom: solid #4BACC6 1.0pt; border-left: none; border-right: solid #4BACC6 1.0pt; border-top: none; mso-border-bottom-themecolor: accent5; mso-border-left-alt: solid #4BACC6 1.0pt; mso-border-left-themecolor: accent5; mso-border-right-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; mso-yfti-cnfc: 64; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin;">Time stamp for this date<o:p></o:p></span></span></span></span></div>
</td>
</tr>
<tr style="mso-yfti-irow: 3;">
<td style="border-top: none; border: solid #4BACC6 1.0pt; mso-border-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; mso-yfti-cnfc: 4; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><b><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin;">Event_type<o:p></o:p></span></b></span></span></span></div>
</td>
<td style="border-bottom: solid #4BACC6 1.0pt; border-left: none; border-right: solid #4BACC6 1.0pt; border-top: none; mso-border-bottom-themecolor: accent5; mso-border-left-alt: solid #4BACC6 1.0pt; mso-border-left-themecolor: accent5; mso-border-right-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin;">notification<o:p></o:p></span></span></span></span></div>
</td>
<td style="border-bottom: solid #4BACC6 1.0pt; border-left: none; border-right: solid #4BACC6 1.0pt; border-top: none; mso-border-bottom-themecolor: accent5; mso-border-left-alt: solid #4BACC6 1.0pt; mso-border-left-themecolor: accent5; mso-border-right-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin;">Other data <o:p></o:p></span></span></span></span></div>
</td>
</tr>
<tr style="mso-yfti-irow: 4;">
<td style="border-top: none; border: solid #4BACC6 1.0pt; mso-border-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div align="center" class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; mso-yfti-cnfc: 68; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-align: center; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><b><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin;">Message_id<o:p></o:p></span></b></span></span></span></div>
</td>
<td style="border-bottom: solid #4BACC6 1.0pt; border-left: none; border-right: solid #4BACC6 1.0pt; border-top: none; mso-border-bottom-themecolor: accent5; mso-border-left-alt: solid #4BACC6 1.0pt; mso-border-left-themecolor: accent5; mso-border-right-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; mso-yfti-cnfc: 64; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin;">454<o:p></o:p></span></span></span></span></div>
</td>
<td style="border-bottom: solid #4BACC6 1.0pt; border-left: none; border-right: solid #4BACC6 1.0pt; border-top: none; mso-border-bottom-themecolor: accent5; mso-border-left-alt: solid #4BACC6 1.0pt; mso-border-left-themecolor: accent5; mso-border-right-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"></span></span></span>
<br />
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; mso-yfti-cnfc: 64; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<br /></div>
</td>
</tr>
<tr style="mso-yfti-irow: 5; mso-yfti-lastrow: yes;">
<td style="border-top: none; border: solid #4BACC6 1.0pt; mso-border-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; mso-yfti-cnfc: 4; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><b><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin;">Message<o:p></o:p></span></b></span></span></span></div>
</td>
<td style="border-bottom: solid #4BACC6 1.0pt; border-left: none; border-right: solid #4BACC6 1.0pt; border-top: none; mso-border-bottom-themecolor: accent5; mso-border-left-alt: solid #4BACC6 1.0pt; mso-border-left-themecolor: accent5; mso-border-right-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin;">M455<o:p></o:p></span></span></span></span></div>
</td>
<td style="border-bottom: solid #4BACC6 1.0pt; border-left: none; border-right: solid #4BACC6 1.0pt; border-top: none; mso-border-bottom-themecolor: accent5; mso-border-left-alt: solid #4BACC6 1.0pt; mso-border-left-themecolor: accent5; mso-border-right-themecolor: accent5; mso-border-top-alt: solid #4BACC6 1.0pt; mso-border-top-themecolor: accent5; padding: 0in 5.4pt 0in 5.4pt;" valign="top"><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"></span></span></span>
<br />
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<br /></div>
</td>
</tr>
</tbody></table>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoListParagraph" style="mso-layout-grid-align: none; mso-list: l0 level1 lfo2; mso-pagination: none; tab-stops: .5in 1.0in 1.5in 2.0in 2.5in 3.0in 3.5in 4.0in 4.5in 5.0in 5.5in 6.0in; text-autospace: none; text-indent: -.25in;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><!--[if !supportLists]--><b style="mso-bidi-font-weight: normal;"><span style="font-size: 12.0pt; mso-bidi-font-family: Verdana; mso-fareast-font-family: Verdana;"><span style="mso-list: Ignore;">2.<span style="font: 7.0pt "Times New Roman";"> </span></span></span></b><!--[endif]--><b style="mso-bidi-font-weight: normal;"><span style="font-family: Arial; font-size: 12.0pt; mso-bidi-language: AR-SA;">Queries</span></b></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><b style="mso-bidi-font-weight: normal;"><span style="font-size: 12.0pt; mso-bidi-font-family: Arial;"><o:p></o:p></span></b></span></span></span></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
As we know in NoSQL, the design of the tables is driven by the queries of the data rather than by its entities & relationships; i.e., optimizing for reads at write time. Here are the types of queries that we want out of the data:</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254618953"><span style="font-family: Arial; font-size: 12.0pt;">Types of Queries:</span></a></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Arial; font-size: 12.0pt;"><o:p></o:p></span></span></span></span></span></div>
<div class="MsoNormal">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Arial; font-size: 12.0pt;">- Q1 -> Get all records for a particular date</span></span></span></span></span></div>
<div class="MsoNormal">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Arial; font-size: 12.0pt;">- Q2 -> Get a record for a date and a device_id</span></span></span></span></span></div>
<div class="MsoNormal">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Arial; font-size: 12.0pt;">- Q3 -> Get a record for a device_id</span></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Arial; font-size: 12.0pt;"> <o:p></o:p></span></span></span></span></span><br />
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Arial; font-size: 12.0pt;">- Q4 -> Get all records for a date and a timestamp between x and y</span></span></span></span></span></div>
<h1 style="mso-list: none; text-indent: 0in;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-size: 12.0pt; font-weight: normal; mso-bidi-font-family: Arial; mso-bidi-font-size: 13.0pt; mso-bidi-font-weight: bold;"><o:p> Lets review the options that we have to construct the table for this data. We will use CQL for this. The main issue has to do with finding the proper Primary Key for our objects.</o:p></span></span></span></span></span></h1>
<h3 style="mso-list: l1 level3 lfo1;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254868539"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254618958"><span style="mso-bookmark: _Toc254868539;"><!--[if !supportLists]--><span style="mso-bidi-font-family: Arial; mso-bidi-language: AR-SA; mso-fareast-font-family: Arial;"><span style="mso-list: Ignore;">1.2.1<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]--><span style="mso-bidi-language: AR-SA;">Option 1:</span></span></a></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="mso-bidi-language: AR-SA;"> <o:p></o:p></span></span></span></span></span></h3>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254618959"><span style="font-family: Arial; font-size: 12.0pt;">PRIMARY KEY = 'date,
device_id'</span></a></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Arial; font-size: 12.0pt;"><o:p></o:p></span></span></span></span></span></div>
<div class="MsoNormal">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Arial; font-size: 12.0pt;">The date object, will be the partitioning key; device_id will be the clustering key. The date object will be the same throughout the day's worth of data.<o:p></o:p></span></span></span></span></span><br />
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Arial; font-size: 12.0pt;">The clustering key (device_id) on the other hand should not be unique. It is used for ordering when querying objects.</span></span></span></span></span><br />
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Arial; font-size: 12.0pt;">This seems like the wrong design altogether.</span></span></span></span></span></div>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<h3 style="mso-list: l1 level3 lfo1;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254868540"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254618960"><span style="mso-bookmark: _Toc254868540;"><!--[if !supportLists]--><span style="mso-bidi-font-family: Arial; mso-bidi-font-size: 13.0pt; mso-bidi-language: AR-SA; mso-fareast-font-family: Arial;"><span style="mso-list: Ignore;">1.2.2<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="mso-bidi-language: AR-SA;">Option 2:</span></span></a></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="mso-bidi-language: AR-SA;"> </span></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="mso-bidi-font-size: 13.0pt; mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></h3>
<div class="MsoNormal">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Arial; font-size: 12.0pt;"> <a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254618961">- PRIMARY KEY = 'device_id'</a><o:p></o:p></span></span></span></span></span></div>
<div class="MsoNormal">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Arial; font-size: 12.0pt;">Only utilize this as the partitioning key. It is preferable as this is a unique value. The partitioning key is used to distribute the data across nodes, so we want this as random as possible.<o:p></o:p></span></span></span></span></span></div>
<div class="MsoNormal">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Arial; font-size: 12.0pt;">Secondary index on date ; required since date is not part of the key.<o:p></o:p></span></span></span></span></span></div>
<div class="MsoNormal">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Arial; font-size: 12.0pt;">Secondary index on timestamp</span></span></span></span></span></div>
<div class="MsoNormal">
<br /></div>
<h3 style="mso-list: l1 level3 lfo1;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254868541"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254618964"><span style="mso-bookmark: _Toc254868541;"><!--[if !supportLists]--><span style="mso-bidi-font-family: Arial; mso-bidi-font-size: 13.0pt; mso-bidi-language: AR-SA; mso-fareast-font-family: Arial;"><span style="mso-list: Ignore;">1.2.3<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="mso-bidi-language: AR-SA;">Option 3:</span></span></a></span></span></span></span><span style="mso-bookmark: _Toc254868541;"></span><span style="mso-bookmark: _Toc254618964;"></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="mso-bidi-font-size: 13.0pt; mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></h3>
<div class="MsoNormal">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254618965"><span style="font-family: Arial; font-size: 12.0pt;">PRIMARY KEY (device_id, date,
timestamp)</span></a></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Arial; font-size: 12.0pt;"><o:p></o:p></span></span></span></span></span></div>
<div class="MsoNormal">
This removes the need for the secondary indexes, and thus is a gain of space. However, it adds constraints as one must using filtering (i.e., scan in memory through a B-tree) for queries on the PK fields.</div>
<h3 style="mso-list: l1 level3 lfo1;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254868542"><!--[if !supportLists]--><span style="mso-bidi-font-family: Arial; mso-bidi-font-size: 13.0pt; mso-bidi-language: AR-SA; mso-fareast-font-family: Arial;"><span style="mso-list: Ignore;">1.2.4<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="mso-bidi-language: AR-SA;">Option 4:</span></a></span></span></span></span><span style="mso-bookmark: _Toc254868542;"></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="mso-bidi-font-size: 13.0pt; mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></h3>
<div class="MsoNormal">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Arial; font-size: 12.0pt;">PRIMARY KEY ((device_id, date), timestamp); <o:p></o:p></span></span></span></span></span></div>
<div class="MsoNormal">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Arial; font-size: 12.0pt;">device_id / date will be the partitioning key. timestamp is the
clustering key.<o:p></o:p></span></span></span></span></span></div>
<h1 style="mso-list: none; text-indent: 0in;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-size: 12.0pt; font-weight: normal; mso-bidi-font-family: Arial; mso-bidi-font-size: 13.0pt; mso-bidi-font-weight: bold;"><o:p> This lets us avoid the 'allow filtering' clause, which slows down the query. However it impedes us from querying on some of the PK fields..</o:p></span></span></span></span></span></h1>
<h2 style="mso-list: l1 level2 lfo1;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254868543"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254618966"><span style="mso-bookmark: _Toc254868543;"><!--[if !supportLists]--><span style="font-style: normal; mso-bidi-font-family: Arial; mso-bidi-font-style: italic; mso-bidi-language: AR-SA; mso-fareast-font-family: Arial;"><span style="mso-list: Ignore;">1.3<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]--><span style="mso-bidi-language: AR-SA;">Summary</span></span></a></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></h2>
<div class="MsoNormal">
<br /></div>
<table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: none; margin-left: -.05in; mso-border-alt: solid black 1.0pt; mso-border-insideh: .75pt solid black; mso-border-insidev: .75pt solid black; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-table-layout-alt: fixed; width: 512px;">
<tbody>
<tr style="mso-yfti-firstrow: yes; mso-yfti-irow: 0;">
<td style="background: #CCFFFF; border: solid black 1.0pt; mso-border-bottom-alt: .75pt; mso-border-color-alt: black; mso-border-left-alt: 1.0pt; mso-border-right-alt: .75pt; mso-border-style-alt: solid; mso-border-top-alt: 1.0pt; padding: 0in 5.4pt 0in 5.4pt; width: .75in;" valign="top" width="54"><div align="center" class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-align: center; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><b style="mso-bidi-font-weight: normal;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">Options</span></b></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><b style="mso-bidi-font-weight: normal;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></b></span></span></span></span></div>
</td>
<td style="background: #CCFFFF; border-left: none; border: solid black 1.0pt; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black 1.0pt; padding: 0in 5.4pt 0in 5.4pt; width: 51.7pt;" valign="top" width="52"><div align="center" class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-align: center; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><b style="mso-bidi-font-weight: normal;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">Q1</span></b></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><b style="mso-bidi-font-weight: normal;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></b></span></span></span></span></div>
</td>
<td style="background: #CCFFFF; border-left: none; border: solid black 1.0pt; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black 1.0pt; padding: 0in 5.4pt 0in 5.4pt; width: 35.25pt;" valign="top" width="35"><div align="center" class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-align: center; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><b style="mso-bidi-font-weight: normal;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">Q2</span></b></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><b style="mso-bidi-font-weight: normal;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></b></span></span></span></span></div>
</td>
<td style="background: #CCFFFF; border-left: none; border: solid black 1.0pt; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black 1.0pt; padding: 0in 5.4pt 0in 5.4pt; width: 52.55pt;" valign="top" width="53"><div align="center" class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-align: center; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><b style="mso-bidi-font-weight: normal;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">Q3</span></b></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><b style="mso-bidi-font-weight: normal;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></b></span></span></span></span></div>
</td>
<td style="background: #CCFFFF; border-left: none; border: solid black 1.0pt; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black 1.0pt; padding: 0in 5.4pt 0in 5.4pt; width: 49.5pt;" valign="top" width="50"><div align="center" class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-align: center; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><b style="mso-bidi-font-weight: normal;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">Q4</span></b></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><b style="mso-bidi-font-weight: normal;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></b></span></span></span></span></div>
</td>
<td style="background: #CCFFFF; border-left: none; border: solid black 1.0pt; mso-border-bottom-alt: .75pt; mso-border-color-alt: black; mso-border-left-alt: .75pt; mso-border-left-alt: solid black .75pt; mso-border-right-alt: 1.0pt; mso-border-style-alt: solid; mso-border-top-alt: 1.0pt; padding: 0in 5.4pt 0in 5.4pt; width: 268.6pt;" valign="top" width="269"><div align="center" class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-align: center; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><b style="mso-bidi-font-weight: normal;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">Comments / Pros & Cons </span></b></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><b style="mso-bidi-font-weight: normal;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></b></span></span></span></span></div>
</td>
</tr>
<tr style="mso-yfti-irow: 1;">
<td style="border-top: none; border: solid black 1.0pt; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black 1.0pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .75in;" width="54"><div align="center" class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-align: center; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">1</span></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 51.7pt;" valign="top" width="52"><div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;"> Yes</span></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 35.25pt;" valign="top" width="35"><div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">Yes</span></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 52.55pt;" valign="top" width="53"><div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">Yes with
filtering</span></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 49.5pt;" valign="top" width="50"><div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">No</span></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-right-alt: solid black 1.0pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 268.6pt;" valign="top" width="269"><div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none; text-indent: -24.0pt;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">1.
Wide-row if for a partition, there are lot of records</span></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></div>
<div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none; text-indent: -24.0pt;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="color: #b13b3c; font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">2</span></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">. The node will be
very hot as everything is going in one row for one day. Lots of contention.
Should be avoided</span></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></div>
</td>
</tr>
<tr style="mso-yfti-irow: 2;">
<td style="border-top: none; border: solid black 1.0pt; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black 1.0pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .75in;" width="54"><div align="center" class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-align: center; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">2</span></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 51.7pt;" valign="top" width="52"><div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">Yes</span></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 35.25pt;" valign="top" width="35"><div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">Yes</span></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 52.55pt;" valign="top" width="53"><div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">Yes</span></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 49.5pt;" valign="top" width="50"><div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">Yes with
filtering</span></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-right-alt: solid black 1.0pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 268.6pt;" valign="top" width="269"><div class="MsoNormal" style="margin-bottom: 14.0pt; margin-left: 0in; margin-right: 35.4pt; margin-top: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">Q4 will be served with Allow filtering</span></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></div>
<div class="MsoNormal" style="margin-bottom: 14.0pt; margin-left: 0in; margin-right: 35.4pt; margin-top: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none; text-indent: -24.0pt;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">1. More space
consumption due to secondary indices</span></span></span></span></span><span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Calibri; font-size: 14.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></span></span></span></span></div>
</td>
</tr>
<tr style="mso-yfti-irow: 3;">
<td style="border-top: none; border: solid black 1.0pt; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black 1.0pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .75in;" width="54"><div align="center" class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-align: center; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">3<o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 51.7pt;" valign="top" width="52"><div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">Yes with
allow filtering<o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 35.25pt;" valign="top" width="35"><div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">Yes<o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 52.55pt;" valign="top" width="53"><div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">Yes<o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 49.5pt;" valign="top" width="50"><div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">Yes with
filtering<o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; mso-border-alt: solid black .75pt; mso-border-left-alt: solid black .75pt; mso-border-right-alt: solid black 1.0pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 268.6pt;" valign="top" width="269"><div class="MsoNormal" style="margin-bottom: 14.0pt; margin-left: 0in; margin-right: 35.4pt; margin-top: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">No index. But not very<span style="color: #032553;"> effective to use allow filtering (~ B-tree in memory) , more than
secondary index on partitionKey.</span><o:p></o:p></span></span></span></span></span></div>
</td>
</tr>
<tr style="height: 45.15pt; mso-yfti-irow: 4; mso-yfti-lastrow: yes;">
<td style="border-top: none; border: solid black 1.0pt; height: 45.15pt; mso-border-bottom-alt: 1.0pt; mso-border-color-alt: black; mso-border-left-alt: 1.0pt; mso-border-right-alt: .75pt; mso-border-style-alt: solid; mso-border-top-alt: .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: .75in;" width="54"><div align="center" class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-align: center; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">4<o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; height: 45.15pt; mso-border-alt: solid black .75pt; mso-border-bottom-alt: solid black 1.0pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 51.7pt;" valign="top" width="52"><div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">No, needs
rowkey<o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; height: 45.15pt; mso-border-alt: solid black .75pt; mso-border-bottom-alt: solid black 1.0pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 35.25pt;" valign="top" width="35"><div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">yes<o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; height: 45.15pt; mso-border-alt: solid black .75pt; mso-border-bottom-alt: solid black 1.0pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 52.55pt;" valign="top" width="53"><div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">No, needs
partition key<o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; height: 45.15pt; mso-border-alt: solid black .75pt; mso-border-bottom-alt: solid black 1.0pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 49.5pt;" valign="top" width="50"><div class="MsoNormal" style="margin-bottom: 14.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">No, needs
row key<o:p></o:p></span></span></span></span></span></div>
</td>
<td style="border-bottom: solid black 1.0pt; border-left: none; border-right: solid black 1.0pt; border-top: none; height: 45.15pt; mso-border-left-alt: solid black .75pt; mso-border-top-alt: solid black .75pt; padding: 0in 5.4pt 0in 5.4pt; width: 268.6pt;" valign="top" width="269"><div class="MsoNormal" style="margin-bottom: 14.0pt; margin-left: 0in; margin-right: 35.4pt; margin-top: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="mso-bookmark: _Toc326936201;"><span style="mso-bookmark: _Toc300259833;"><span style="mso-bookmark: _Toc300259933;"><span style="mso-bookmark: _Toc213234777;"><span style="font-family: Tahoma; font-size: 11.0pt; mso-bidi-language: AR-SA;">No need for filtering, but then too
restrictive.<o:p></o:p></span></span></span></span></span></div>
</td>
</tr>
</tbody></table>
<div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>1410</o:Words>
<o:Characters>8042</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>67</o:Lines>
<o:Paragraphs>18</o:Paragraphs>
<o:CharactersWithSpaces>9434</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:RelyOnVML/>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="0" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="0" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="0" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="0" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="0" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="0" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="0" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="0" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="0" Name="footer"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-bidi-font-family:Mangal;}
table.MsoTableLightListAccent5
{mso-style-name:"Light List - Accent 5";
mso-tstyle-rowband-size:1;
mso-tstyle-colband-size:1;
mso-style-priority:61;
mso-style-unhide:no;
border:solid #4BACC6 1.0pt;
mso-border-themecolor:accent5;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:Calibri;
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;
mso-fareast-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;}
table.MsoTableLightListAccent5FirstRow
{mso-style-name:"Light List - Accent 5";
mso-table-condition:first-row;
mso-style-priority:61;
mso-style-unhide:no;
mso-tstyle-shading:#4BACC6;
mso-tstyle-shading-themecolor:accent5;
mso-para-margin-top:0in;
mso-para-margin-bottom:0in;
mso-para-margin-bottom:.0001pt;
line-height:normal;
color:white;
mso-themecolor:background1;
mso-ansi-font-weight:bold;
mso-bidi-font-weight:bold;}
table.MsoTableLightListAccent5LastRow
{mso-style-name:"Light List - Accent 5";
mso-table-condition:last-row;
mso-style-priority:61;
mso-style-unhide:no;
mso-tstyle-border-top:2.25pt double #4BACC6;
mso-tstyle-border-top-themecolor:accent5;
mso-tstyle-border-left:1.0pt solid #4BACC6;
mso-tstyle-border-left-themecolor:accent5;
mso-tstyle-border-bottom:1.0pt solid #4BACC6;
mso-tstyle-border-bottom-themecolor:accent5;
mso-tstyle-border-right:1.0pt solid #4BACC6;
mso-tstyle-border-right-themecolor:accent5;
mso-para-margin-top:0in;
mso-para-margin-bottom:0in;
mso-para-margin-bottom:.0001pt;
line-height:normal;
mso-ansi-font-weight:bold;
mso-bidi-font-weight:bold;}
table.MsoTableLightListAccent5FirstCol
{mso-style-name:"Light List - Accent 5";
mso-table-condition:first-column;
mso-style-priority:61;
mso-style-unhide:no;
mso-ansi-font-weight:bold;
mso-bidi-font-weight:bold;}
table.MsoTableLightListAccent5LastCol
{mso-style-name:"Light List - Accent 5";
mso-table-condition:last-column;
mso-style-priority:61;
mso-style-unhide:no;
mso-ansi-font-weight:bold;
mso-bidi-font-weight:bold;}
table.MsoTableLightListAccent5OddColumn
{mso-style-name:"Light List - Accent 5";
mso-table-condition:odd-column;
mso-style-priority:61;
mso-style-unhide:no;
mso-tstyle-border-top:1.0pt solid #4BACC6;
mso-tstyle-border-top-themecolor:accent5;
mso-tstyle-border-left:1.0pt solid #4BACC6;
mso-tstyle-border-left-themecolor:accent5;
mso-tstyle-border-bottom:1.0pt solid #4BACC6;
mso-tstyle-border-bottom-themecolor:accent5;
mso-tstyle-border-right:1.0pt solid #4BACC6;
mso-tstyle-border-right-themecolor:accent5;}
table.MsoTableLightListAccent5OddRow
{mso-style-name:"Light List - Accent 5";
mso-table-condition:odd-row;
mso-style-priority:61;
mso-style-unhide:no;
mso-tstyle-border-top:1.0pt solid #4BACC6;
mso-tstyle-border-top-themecolor:accent5;
mso-tstyle-border-left:1.0pt solid #4BACC6;
mso-tstyle-border-left-themecolor:accent5;
mso-tstyle-border-bottom:1.0pt solid #4BACC6;
mso-tstyle-border-bottom-themecolor:accent5;
mso-tstyle-border-right:1.0pt solid #4BACC6;
mso-tstyle-border-right-themecolor:accent5;}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1027"/>
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1"/>
</o:shapelayout></xml><![endif]-->
<!--StartFragment-->
<span style="font-family: "Menlo Regular"; font-size: 12.0pt; mso-ansi-language: EN-US; mso-bidi-language: AR-SA; mso-fareast-font-family: "Times New Roman"; mso-fareast-language: EN-US;"><br clear="all" style="mso-special-character: line-break; page-break-before: always;" />
</span>
<br />
<div class="MsoNormal">
The best choice seems to be option 2.</div>
<div class="MsoNormal">
<br /></div>
<h1>
<a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254868544"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254618967"><!--[if !supportLists]-->2<span style="font-family: 'Times New Roman'; font-size: 7pt; font-weight: normal;">
</span><!--[endif]-->Appendix</a><o:p></o:p></h1>
<h2>
Definitions</h2>
<div>
partitionkey = date</div>
<div>
rowkey = device_id</div>
<div>
range = timestamp.</div>
<h2>
<a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254868545"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254618968"><!--[if !supportLists]-->2.1<span style="font-family: 'Times New Roman'; font-size: 7pt; font-weight: normal;"> </span><!--[endif]-->Option
1</a><o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">CREATE TABLE tablestorage6
(<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> partitionkey text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> rowkey text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> policyid text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> policyname text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> range timestamp,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> PRIMARY KEY (partitionkey, rowkey);<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">CREATE INDEX i1 ON
tablestorage6 (range);</span><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">cqlsh:demodb> select *
from tablestorage6 ;<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">partitionkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">rowkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">policyid</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">policyname</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">range</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">--------------+--------+----------+------------+--------------------------<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">p1</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">r1</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">pl1</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">plicy1</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #107b02; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">2007-01-02 00:00:00+0000</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">p1</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">r2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">pl2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">plicy2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #107b02; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">2007-01-03 00:00:00+0000</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">p2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">r3</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">pl3</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">plicy3</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #107b02; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">2008-01-03 00:00:00+0000</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">(3 rows)<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">cqlsh:demodb> select *
from tablestorage6 where rowkey = 'r2' allow filtering;<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">partitionkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">rowkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">policyid</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">policyname</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">range</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">--------------+--------+----------+------------+--------------------------<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">p1</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">r2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">pl2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">plicy2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #107b02; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">2007-01-03 00:00:00+0000</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">(1 rows)<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<h3>
<a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254868546"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254618969"><!--[if !supportLists]--><span style="font-family: Calibri; font-size: 16.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA; mso-fareast-font-family: Calibri;">2.1.1<span style="font-family: 'Times New Roman'; font-size: 7pt; font-weight: normal;"> </span></span><!--[endif]--><span style="font-family: Calibri; font-size: 16.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;">Option 2</span></a><span style="font-family: Calibri; font-size: 16.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></h3>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">CREATE TABLE tablestorage
(<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> rowkey text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> clientipaddress text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> deviceid text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> eventtype text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> grouppolicyid text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> grouppolicyname text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> loglevel text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> macaddress text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> messageid text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> model text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> msgparam text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> packageid text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> partitionkey text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> policyid text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> policyname text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> serialnumber text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> "timestamp" timestamp,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> username text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> PRIMARY KEY (rowkey)<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">create index a on
tablestorage(partitionkey);<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">select * from tablestorage
where rowkey='726716f7-2e54-4715-9f00-91dcbea999448' and partitionkey = '2' ;<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">rowkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">clientipaddress</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">deviceid</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">eventtype</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">grouppolicyid</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">grouppolicyname</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">loglevel</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">macaddress</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">messageid</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">model</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">msgparam</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">packageid</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">partitionkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">policyid</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">policyname</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">serialnumber</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">timestamp</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">username</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">---------------------------------------+-----------------+----------+--------------+---------------+-----------------+----------+------------+-----------+--------+----------+-----------+--------------+----------+------------+--------------+------------------------------+----------<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">726716f7-2e54-4715-9f00-91dcbea999448</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">Host2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">id2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">CreateSchema</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">gpid2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">gp2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">debug</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">1234</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">5</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">model2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">sgp2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">pkg2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">pol2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">plnme2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">seril2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #107b02; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">20133114-07-02 03:19:39+0000</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">matt8</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">(1 rows)<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">cqlsh:demodb> select *
from tablestorage where partitionkey = '2' and timestamp > '20131212' allow
filtering;<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">rowkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">clientipaddress</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">deviceid</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">eventtype</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">grouppolicyid</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">grouppolicyname</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">loglevel</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">macaddress</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">messageid</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">model</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">msgparam</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">packageid</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">partitionkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">policyid</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">policyname</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">serialnumber</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">timestamp</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">username</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">---------------------------------------+-----------------+----------+--------------+---------------+-----------------+----------+------------+-----------+--------+----------+-----------+--------------+----------+------------+--------------+------------------------------+----------<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">726716f7-2e54-4715-9f00-91dcbea999442</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">Host2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">id2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">CreateSchema</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">gpid2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">gp2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">debug</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">1234</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">5</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">model2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">sgp2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">pkg2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">pol2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">plnme2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">seril2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #107b02; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">20133114-07-02 03:14:27+0000</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"> | </span><b><span style="color: #686c03; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">matt2</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">(1 rows)<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<h3>
<a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254868547"></a><a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254618970"><!--[if !supportLists]--><span style="font-family: Calibri; font-size: 16.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA; mso-fareast-font-family: Calibri;">2.1.2<span style="font-family: 'Times New Roman'; font-size: 7pt; font-weight: normal;"> </span></span><!--[endif]-->Option 3</a><span style="font-family: Calibri; font-size: 16.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></h3>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">CREATE TABLE
tablestorage10 (<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> rowkey text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> partitionkey text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> range timestamp,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> policyid text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> policyname text,<o:p></o:p></span></div>
<div class="MsoNormal">
<b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> PRIMARY KEY (rowkey,
partitionkey, range)<o:p></o:p></span></b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">cqlsh:demodb> select *
from tablestorage10;<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">rowkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">partitionkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">range</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">policyid</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">policyname</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">--------+--------------+--------------------------+----------+------------<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">r3</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">p2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #107b02; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">2008-01-03 00:00:00+0000</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">pl3</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">plicy3</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">r2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">p1</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #107b02; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">2007-01-03 00:00:00+0000</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">pl2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">plicy2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">(2 rows)<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">Q4: cqlsh:demodb>
select * from tablestorage10 where partitionkey= 'p1' and range >
'2006-01-01' allow filtering;<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">rowkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">partitionkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">range</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">policyid</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">policyname</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">--------+--------------+--------------------------+----------+------------<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">r2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">p1</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #107b02; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">2007-01-03 00:00:00+0000</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">pl2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">plicy2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">(1 rows)<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">cqlsh:demodb><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">but<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">cqlsh:demodb> select *
from tablestorage10 where range > '2006-01-01' allow filtering;<o:p></o:p></span></div>
<div class="MsoNormal">
<b><span style="color: #890006; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">Bad Request: PRIMARY KEY
part range cannot be restricted (preceding part partitionkey is either not
restricted or by a non-EQ relation)</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">cqlsh:demodb><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">cqlsh:demodb> select *
from tablestorage10 where partitionkey='p1' and
range > '2006-01-01' allow filtering;<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">rowkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">partitionkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">range</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">policyid</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">policyname</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">--------+--------------+--------------------------+----------+------------<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">r2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">p1</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #107b02; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">2007-01-03 00:00:00+0000</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">pl2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">plicy2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">(1 rows)<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">cqlsh:demodb><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">cqlsh:demodb> select *
from tablestorage10 where rowkey = 'r2' and
range > '2006-01-01' allow filtering;<o:p></o:p></span></div>
<div class="MsoNormal">
<b><span style="color: #890006; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">Bad Request: PRIMARY KEY
part range cannot be restricted (preceding part partitionkey is either not
restricted or by a non-EQ relation)</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">cqlsh:demodb><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">cqlsh:demodb> select *
from tablestorage10 where rowkey = 'r2';<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">rowkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">partitionkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">range</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">policyid</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">policyname</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">--------+--------------+--------------------------+----------+------------<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">r2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">p1</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #107b02; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">2007-01-03 00:00:00+0000</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">pl2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">plicy2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">(1 rows)<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">cqlsh:demodb> select *
from tablestorage10 where rowkey = 'r2' and partitionkey = 'p1';<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">rowkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">partitionkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">range</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">policyid</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">policyname</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">--------+--------------+--------------------------+----------+------------<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">r2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">p1</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #107b02; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">2007-01-03 00:00:00+0000</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">pl2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">plicy2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">(1 rows)<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">cqlsh:demodb> select *
from tablestorage10 where partitionkey = 'p1';<o:p></o:p></span></div>
<div class="MsoNormal">
<b><span style="color: #890006; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">Bad Request: Cannot
execute this query as it might involve data filtering and thus may have
unpredictable performance. If you want to execute this query despite the
performance unpredictability, use ALLOW FILTERING</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">cqlsh:demodb> select *
from tablestorage10 where partitionkey = 'p1' allow filtering;<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">rowkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">partitionkey</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">range</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">policyid</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #97009a; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">policyname</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">--------+--------------+--------------------------+----------+------------<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;"> </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">r2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">p1</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #107b02; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">2007-01-03 00:00:00+0000</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">pl2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"> | </span><b><span style="color: #686b03; font-family: "Menlo Regular"; font-size: 13.0pt; mso-bidi-language: AR-SA;">plicy2</span></b><span style="font-family: 'Menlo Regular'; font-size: 13pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">(1 rows)<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 13pt;">cqlsh:demodb> </span><span style="font-family: 'Menlo Regular'; font-size: 13pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<h3>
<a href="https://www.blogger.com/blogger.g?blogID=170648781806274754&pli=1" name="_Toc254868548"><!--[if !supportLists]--><span style="font-family: Calibri; font-size: 16.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA; mso-fareast-font-family: Calibri;">2.1.3<span style="font-family: 'Times New Roman'; font-size: 7pt; font-weight: normal;"> </span></span><!--[endif]-->Option 4</a><span style="font-family: Calibri; font-size: 16.0pt; mso-bidi-font-family: Calibri; mso-bidi-language: AR-SA;"><o:p></o:p></span></h3>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">CREATE TABLE
tablestorage8 (<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> partitionkey text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> rowkey text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> range timestamp,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> policyid text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;"> policyname text,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">
PRIMARY KEY ((partitionkey, rowkey), range)<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">demodb>
select * from tablestorage8 where
partitionkey='p2';<o:p></o:p></span></div>
<div class="MsoNormal">
<b><span style="color: #8a0006; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">Bad Request: Partition key part rowkey
must be restricted since preceding part is<o:p></o:p></span></b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">select * from
tablestorage8 where rowkey='r3' and range > '20131212';<o:p></o:p></span></div>
<div class="MsoNormal">
<b><span style="color: #8a0006; font-family: "Menlo Bold"; font-size: 11.0pt; mso-bidi-language: AR-SA;">Bad Request:
partition key part rowkey cannot be restricted (preceding part partitionkey is
either not restricted or by a non-EQ relation)</span></b><span style="font-family: 'Menlo Regular'; font-size: 11pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: 'Menlo Regular'; font-size: 11pt;">cqlsh:demodb>
<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<!--EndFragment--><br />
<div class="MsoNormal">
<br /></div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com0tag:blogger.com,1999:blog-170648781806274754.post-14259739205265939832014-02-04T09:11:00.001-08:002014-02-04T09:11:53.395-08:00IBM stampede program notes<div style="border-bottom: solid #4F81BD 1.0pt; border: none; mso-border-bottom-themecolor: accent1; mso-element: para-border-div; padding: 0in 0in 4.0pt 0in;">
<div class="MsoTitle">
The IBM Stampede training is a second part training about
the IBM products; one of its main
attraction was that it was taught by an actual IBM solutions architect, giving
real-case examples of his past projects. These are some of the notes taken
during the training.</div>
</div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h1>
Components<o:p></o:p></h1>
<div class="MsoNormal">
It is important to have some kind of data stewardship when
dealing with amounts of data; a lot of companies essentially deal with this on
an ad-hoc way instead.<o:p></o:p></div>
<div class="MsoNormal">
There are essentially 3 components at play <o:p></o:p></div>
<div class="MsoNormal">
- Hadoop ; data at rest, landed<o:p></o:p></div>
<div class="MsoNormal">
- Stream ; data in motion<o:p></o:p></div>
<div class="MsoNormal">
- Data warehouse<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
IBM offers these 3 components as part of their Big Insights
offering.<o:p></o:p></div>
<div class="MsoNormal">
Also, IBM offers Accelerators that are essentially frameworks
for working with specific use cases, for data that is not harmonized together.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In addition, IBM offers Watson, which is able to perform NLP
(language processing) given a given context. <o:p></o:p></div>
<div align="right" class="MsoNormal" style="text-align: right;">
<br /></div>
<div align="right" class="MsoNormal" style="text-align: right;">
<br /></div>
<h2>
Hadoop<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Hadoop in the context of the Data warehouse (DW)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Hadoop’s sweet spot from a data perspective is for the
queryable archive of data, the cold/unused data offloaded from the expensive
DW.<o:p></o:p></div>
<div class="MsoNormal">
Hadoop is seen as performing DW Augmentation. The DW
(typically Netezza) stays, only is complemented by Hadoop.<o:p></o:p></div>
<div class="MsoNormal">
IBM talks about the Analytics landing zone as the logical
data storage area.<o:p></o:p></div>
<div class="MsoNormal">
In comparison, this is similar to the Enterprise hub from
cloudera, or the Data lake from Hortonworks.<o:p></o:p></div>
<div class="MsoNormal">
The landing zone is essentially for raw data (which ends up
being stored along time) in addition to modeled data.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Obviously from a cost perspective, Hadoop is cheaper. It is
also for “untrusted sources”, vs trusted sources in the DW. Hence the data is
segmented.<o:p></o:p></div>
<div class="MsoNormal">
Hadoop is mainly used today to offload cold data that is
typically unused.<o:p></o:p></div>
<div class="MsoNormal">
This data now becomes a queryable archive; instead of being
stored to tape.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Delta files also end up being stored in the landing zone. They
stay in the low cost platform, for recovery purposes.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Hadoop has a “Wild West” mentality today, the same as what
the Data warehouse used to have 10 years ago! For example in DW, there used to
be no systems management, nor recovery. This is now the case in Hadoop: i.e.
the relative poor posix support of hdfs, the non-existent audit trail, etc.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Data Warehouse<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The definition of a DW is central repository used for
reporting & data analysis.<o:p></o:p></div>
<div class="MsoNormal">
Its challenges are:<o:p></o:p></div>
<div class="MsoNormal">
- It stores structured data.<o:p></o:p></div>
<div class="MsoNormal">
-It mainly uses Batch data<o:p></o:p></div>
<div class="MsoNormal">
-It has limited history, due to data volume constraints:
thus, it mainly stores aggregated views.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The data-warehousing instance usually follows a set of
processes, with the following: <o:p></o:p></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->A Data owner<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->A Data steward <o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Data governance<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Measurements via KPI’s<o:p></o:p></div>
<div class="MsoListParagraphCxSpLast" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Data Lineage to trace the data.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
With Hadoop we now talk about DW Augmentation, to leverage
all data and get timely insights, in a cost-optimized way. There starts to be
some kind of data federation between DW and Hadoop also, via tools like Cirro;
but typically you want to avoid data latency, and data movement (data is not
collocated), depending on the cardinality of the data.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h1>
Use Cases<o:p></o:p></h1>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
IBM sees Big Data exploration as 90% of the use cases. A lot
of use cases have to do with finding new expected traits, via exhaust data.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Other use cases vary depending on the vertical; an example
is Fraud detection with Hadoop at a major Credit card company. In that use
case, detailed transactions in aggregated in a fraud model, utilizing a large
volume of structured data, with a small set of users; this is NOT like the standard
social data analytics use case that creates all the buzz today.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h1>
IBM-specific Big Data tools<o:p></o:p></h1>
<h2>
BigSQL<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
BigSQL is IBM’s “secret sauce” around querying data in
Hadoop, and is a SQL-like high-level language and wrapper to give access to the
data in Big Insights.<o:p></o:p></div>
<div class="MsoNormal">
Compared to Cloudera’s Impala, IBM says it is fully SQL 92
compliant, and is more accurate.<span style="mso-spacerun: yes;"> </span>That
said, BigSQL is still a revision 0 product. <span style="mso-spacerun: yes;"> </span>It is very effective at what it does because
some of the IBM research results went in its design; i.e. the DB2 cost-based
optimizers are in there.<o:p></o:p></div>
<div class="MsoNormal">
It also has a “local table” approach to querying the data,
effectively bypassing HDFS by using local storage; it is one of its value-add.
BigSQL has access to data stored in HBase, Hive, or DB2 for that matter.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Also, BigSQL has advantages over Hive: sub queries are not
possible in Hive.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Data transformations in Big Insights<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
IBM’s Big Insights integrates Apache’s Pig and Hive which
are the most common tools used by the community to perform transformations.<o:p></o:p></div>
<div class="MsoNormal">
For text analytics, the Annotation Query Language (AQL) is
widely used for text analytics, via context extractors.<o:p></o:p></div>
<div class="MsoNormal">
And BigSheets, the Excel-like Big Data tool generates Pig
under the hood; it will also integrate with BigSQL soon.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
IBM also attempts to enhance data science tools like R: today
R doesn’t work in parallel for the Reduce side of M/R and goes back to a single
node. With IBM Big R, the execution of these functions is parallelized; in
addition it removes R’s memory limitation.<o:p></o:p></div>
<div class="MsoNormal">
For Data mining, SPSS and SAS are still the most widely used
tools for predictive analytics. Of note, the FDA used to certify drug models
prediction only with SAS, but recently opened it up R for statistical models.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h2>
Data explorer<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
IBM’s Data Explorer <span style="background: white; color: #333333; font-family: Arial; font-size: 11.5pt; mso-fareast-font-family: "Times New Roman";">is a search tool, that provides core indexing, discovery,
navigation and search capabilities. The user typically visualizes the results
on a portal.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #333333; font-family: Arial; font-size: 11.5pt; mso-fareast-font-family: "Times New Roman";">Howerver the data is on its
own server, and usually the </span>size of index is 2.5-3 times larger than the
data.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h1>
Analytics<o:p></o:p></h1>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
There are essentially 3 levels of analytics reporting:<o:p></o:p></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Descriptive:<span style="mso-spacerun: yes;">
</span>essentially what is known as Operational reports, summarizing the data,
usually a certain type of metrics (i.e. number of followers). This is akin to
looking through your rear-view mirror, without knowing where you are going.<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Predictive: in-depth analysis of the data via
data mining tools, to try and make predictions about the future from the data
that you have based on a set of assumptions.<o:p></o:p></div>
<div class="MsoListParagraphCxSpLast" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-ascii-font-family: Cambria; mso-bidi-font-family: Cambria; mso-fareast-font-family: Cambria; mso-hansi-font-family: Cambria;"><span style="mso-list: Ignore;">-<span style="font: 7.0pt "Times New Roman";">
</span></span></span><!--[endif]-->Prescriptive: in addition to give prediction
about the data, a prescriptive analysis recommends courses of actions based on
actionable data and a feedback system.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h2>
Use case<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In the retail vertical, a typical use case has been around
how to better induce a sale. Retailers are essentially looking for the
shopper’s trigger point. <o:p></o:p></div>
<div class="MsoNormal">
To deduce this, retailers obtain the Mac address of mobile
phone devices in range of their router (no need to be on the Wifi, mobiles
broadcast their Mac address!) to find patterns of aisles routes.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In the car vertical, IBM has done work with major car
manufacturers to pull data in motion from car sensors. Aggregation of all of
the data can give some very interesting insights on the typical usage of the
cars.<o:p></o:p></div>
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>1015</o:Words>
<o:Characters>5792</o:Characters>
<o:Company>impetus</o:Company>
<o:Lines>48</o:Lines>
<o:Paragraphs>13</o:Paragraphs>
<o:CharactersWithSpaces>6794</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="0" Name="Body Text"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
They mentioned that a new car like a Ford fusion that has a
lot of sensors, will yield about 2Tb of data!<o:p></o:p></div>
Anonymoushttp://www.blogger.com/profile/04037423523500875654noreply@blogger.com0