Today is about Hive and Beeswax. Please find the lecture slides here:Lecture25
and todays homework here:HW4_AH
Todays lecture is about programming Pig using Pig-Latin. We go over several commands and datatype structures and implement a couple of examples. Please find todays slides here:Lecture24
and the Exercise here:Exercise24
The example file I used in class:grades
The first issue some of you might run into is “savemode” on the namenode. For some reasons sometimes the image starts and the namenode is in savemode, which also sometimes happens spontaneously
and I don’t know why…
Always start the cloudera manager first. I right click it and start the manager in a second browser tab, that makes starting HUE easier later. Anyways, once you started manager, look for the HDFS1 tab which should show a green circle behind it. In the same row you see a link called namenode 1, which you need to click. In the namenode tab you see a button called “Actions”. From that button choose “leave savemode” and click it. Once you confirmed that action you need to wait a couple of seconds until the action was completed successfully.
In order to start HUE, you either switch back to the first tab, or go back until you see the startmenu again. Click on “HUE”. This will start the user interface. The top left button on the title row is called “About HUE”. You will get the the “Quick Start Wizzard” which on the first time will show a couple of red error messages. The one that is causing most of the problems is oozie not working. Either click on “next” or on “examples”. Here you can install various missing components. Click “all” because that installs all issues. Once that is installed keep clicking “next” until you reach “Use the applications” and start “Hue Home”.
Once that is done you need to restart your system.
Now that you restarted, check that the namenode is not in savemode, and while you are there, check if oozie is active, otherwise make it active.
If you now start HUE you should be find and can start pig without any more issues.
This lecture is about PIG and PIG-Latin. Please find the slides here:Lecture23
and the Exercise here:Exercise23
In this part of the lecture we dealt with Hadoop and MapRecuce. I have one set of slides for both since we stayed on certain topics quite long:
The homework can be found here:
This is the first lecture about Hadoop and we need to install a couple of things to get it going. I will talk you through it in class, but here is also the todo list:Exercise19
And the lecture:Lecture19
This lecture presents a collection of different approaches to detect outliers and anomalies.
The lecture slides are here:Lecture 16
The exercise is here:Exercise 16
The files for the exercise:
These are a couple of example questions, I will of course ask more and about different topics not included in this example. The questions here are intended to illustrate the kind of questions asked. Imagine these kind of questions, but about all topics covered.
Todays lecture is about hierarchical clustering and how the result of clustering methods can be compared.
The slides are here:Lecture 15
The iPython notebook is here:cse891 – Clustering II.ipynb
Instead of a typical exercise I compiled a list of covered topics for next weeks mid term:Exercise 15
This lecture is about clustering. We will introduce the idea of clustering using kmeans as an example methods. We discuss the advantages and caveats of kmeans as well as different ideas behind clustering.
The lecture slides:Lecture 14
The python notebook: cse891 – clustering I.ipynb
the csv test file for weka:test