Author: ArendHintze

Issues with running pig

The first issue some of you might run into is “savemode” on the namenode. For some reasons sometimes the image starts and the namenode is in savemode, which also sometimes happens spontaneously
and I don’t know why…

Always start the cloudera manager first. I right click it and start the manager in a second browser tab, that makes starting HUE easier later. Anyways, once you started manager, look for the HDFS1 tab which should show a green circle behind it. In the same row you see a link called namenode 1, which you need to click. In the namenode tab you see a button called “Actions”. From that button choose “leave savemode” and click it. Once you confirmed that action you need to wait a couple of seconds until the action was completed successfully.

In order to start HUE, you either switch back to the first tab, or go back until you see the startmenu again. Click on “HUE”. This will start the user interface. The top left button on the title row is called “About HUE”. You will get the the “Quick Start Wizzard” which on the first time will show a couple of red error messages. The one that is causing most of the problems is oozie not working. Either click on “next” or on “examples”. Here you can install various missing components. Click “all” because that installs all issues. Once that is installed keep clicking “next” until you reach “Use the applications” and start “Hue Home”.

Once that is done you need to restart your system.

Now that you restarted, check that the namenode is not in savemode, and while you are there, check if oozie is active, otherwise make it active.

If you now start HUE you should be find and can start pig without any more issues.

Cheers Arend

Example Questions

These are a couple of example questions, I will of course ask more and about different topics not included in this example. The questions here are intended to illustrate the kind of questions asked. Imagine these kind of questions, but about all topics covered.

    You have to compare three cars, and each car has different features. Which two cars are most similar, using Jaccard index?

    • A: turbo, radio, extra suspensions, 3 year warranty
    • B: radio, usb port, 3 year warranty
    • C: no warranty
    If you would have used simple matching coefficient, would the result be different?
    What are the three V in big data analysis?
    Classify the three cars from the first question using the following rule based classifier:

    • R1: has a radio, has a turbo -> buy
    • R2: has no warranty, has no usb port -> don’t buy
    • default: -> wreck
    Imagine a tree building algorithm, and the data set is the one from the car example.

    • List all item sets of size 2 with a support over 50%.
    • What is the confidence of the following rule: {has warranty}->{usb port}?
    I have the following linear equation: f(x)=0.5x+2.0
    Is this the right fit for the following data set:
    X Y
    1 0.2
    2 4.0
    3 1.0