Installing Hadoop,Spark and ElasticSearch on a Single-node/Multi-node cluster with Ubuntu 14.04

Installing Hadoop,Spark and ElasticSearch

on a Single-node/Multi-node cluster with Ubuntu 14.04

Sabeur Aridhi sabeur.aridhi@telecomnancy.eu

In this document we present tutorials for:

Hadoop
Spark
ElasticSearch

Tutorial 1: Hadoop and HDFS

Setup a Hadoop cluster

Name	Hadoop
System type	Ubuntu
CPU	2 Cores
RAM	4GB
DISK	15GB

Download & Install Ubuntu in the VM instance

link

hadoop

Install Guest Additions

Option A

/usr/share/virtualbox/

sh /media/hadoop/VBOXADDITIONS_4.3.34_104062/autorun.sh

Option B

start the virtual machine
in the VirtualBox application
you should find a menu entry „Devices“
select „Insert Guest Additions CD image…“
follow the installation diagolues
Reboot

Install important packages and JAVA JDK

sudo apt-get update

$ sudo apt-get install build-essential uuid-dev autoconf rsync
$ sudo apt-get install aptitude

$ sudo aptitude search openjdk-7
$ sudo apt-get install openjdk-7-jdk

$ javac -version

$ update-java-alternatives -l

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

export PATH=$PATH:$JAVA_HOME

$ source .bashrc

Python version and packages

$ python

$ sudo apt-get install python2.7-numpy
$ sudo apt-get install python2.7-scipy

SSH Setup

$ sudo apt-get install openssh-server

$ ssh-keygen -t rsa -P "" (just press enter when asked for filename)
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$ chmod 0600 /home/hadoop/.ssh/authorized_keys

$ ssh localhost
$ logout

Download Hadoop

wget  ftp://ftp.funet.fi/pub/mirrors/apache.org/hadoop/common/hadoop-2.6.3/hadoop-2.6.3.tar.gz

$ cd Downloads
$ tar -xzf hadoop-2.6.3.tar.gz

Configure Hadoop and HDFS

$ nano $HOME/.bashrc

export HADOOP_HOME=$HOME/Downloads/hadoop-2.6.3/
export PATH=$PATH:$HADOOP_HOME/bin

$ source $HOME/.bashrc

hadoop

<property>
    <name>hadoop.tmp.dir</name>
    <value>/tmp</value>
</property>
<property>
    <name<>fs.defaultFS <</name>
    <value>hdfs://localhost:9000</value>
 </property>

<property>
  <name>yarn.nodemanager.aux-services </name>
  <value>mapreduce_shuffle </value>
</property>

<property>
  <name>dfs.replication </name>
  <value>1 </value>
</property>

<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>

Format HDFS

$ cd $HADOOP_HOME
$ bin/hadoop namenode -format

Run Hadoop from master (hadoop)

$ ./Downloads/hadoop-2.6.3/sbin/start-all.sh

 $ jps (shows all spark and hadoop related demons)

Test Hadoop

$ ./Downloads/hadoop-2.6.3/bin/hadoop fs -mkdir /test1
$ ./Downloads/hadoop-2.6.3/bin/hadoop fs -put ~/bigtext.txt /test1
$ ./Downloads/hadoop-2.6.3/bin/hadoop fs -ls /test1

$ cd $HADOOP_HOME
$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.3.jar wordcount bigtext.txt output

$ ./Downloads/hadoop-2.6.3/sbin/stop-all.sh

Tutorial 2: Installing Spark

Back to Menu

Download Spark

this link

$ cd Downloads
$ tar -xzf spark-1.6.0-bin-hadoop2.4.tgz

Configure Spark [only in case of Multi Node Cluster]

$ cd Downloads/spark-1.6.0-bin-hadoop2.4/conf
$ cp slaves.template slaves
$ nano slaves

machine 1
machine 2
...

Download and configure Hadoop and HDFS (Tutorial 1)
Run Spark from master (spark)

$ ./Downloads/spark-1.6.0-bin-hadoop2.4/sbin/start-all.sh

$ ./Downloads/hadoop-2.6.3/sbin/start-all.sh

$ jps (shows all spark and hadoop related demons)

Test Spark

$ cd /Downloads/spark-1.6.0-bin-hadoop2.4
$ ./bin/spark-submit --name "test1" --master spark://hadoop:7077 examples/src/main/python/
    wordcount.py hdfs://hadoop:9000/test1/bigtext.txt

$ ./Downloads/spark-1.6.0-bin-hadoop2.4/sbin/stop-all.sh
$ ./Downloads/hadoop-2.6.3/sbin/stop-all.sh

Tutorial 3: Installing Elasticsearch

Back to Menu

Download Elasticsearch

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.1.1.tar.gz

tar -xvf elasticsearch-5.1.1.tar.gz

Elasticsearch settings

cluster.name: mycluster

Node.name:node1

nade.name=${HOSTNAME}

cd ES-Home
./bin/elasticsearch

curl http://localhost:9200

{
	"name" : "node-1",
	"cluster_name" : "my-cluster",
	"version" : {
		"number" : "2.3.3",
		"build_hash" : "218bdf10790eef486ff2c41a3df5cfa32dadcfde",
		"build_timestamp" : "2016-05-17T15:40:04Z",
		"build_snapshot" : false,
		"lucene_version" : "5.5.0"
	},
	"tagline" : "You Know, for Search"
}

Use elasticsearch

curl -X POST 'http://localhost:9200/database/table/id' -d '{"first name":"mohamed","last name":"tounsi" }'

{"_index":"database","_type":"table","_id":"id","_version":1,"created":true}

curl 'localhost:9200/_cat/indices?v'

curl XGET â€˜http://localhost:9200/index/_mappingâ€™

Get all list of documents in index elasticsearch

GET /index/type/_search
{
	"query": {
	"match_all": {}
	}
}

GET /index/type/_search
{
	"query": {
		"match_all": {}
	},
	"size": 2
}

GET /index/type/_search
{
	"query": {
		"match_all": {}
	},
	"sort": {
		"FIELD": {
			"order": "desc"
		}
	}
}

GET /index/type/_search
{
	"query": {
		"match_all": {}
	},
"_source":["filed","filed"]
}

GET /index/type/_count
{
	"query": {
		"match_all": {}
	},
"_source":["filed","filed"]
}

Query and filter in Elsticsearch

Query :

Filter :

GET /index/type/_search
{
	"query": {
		"bool": {
			"must": [
			{"match": {"field":"text"}},{"match": {"field":"text"}}
			]
		}
	}
}
GET /bank/account/_search
{
	"query": {
		"bool": {
			"should": [
				{ "match": { "field": "text" } },
				{ "match": { "field": "text" } }
			]
		}
	}
}

GET /index/type/_search
{
	"query": {
		"filtered": {
			"filter": {
				"range": {
					"numeric field": {
						"from": val1,
						"to": val2
					}
				}
			}
		}
	}
}
GET /index/type/_search
{
	"query": {
		"filtered": {
			"filter": {
				"range": {
					"numeric field": {
						"gte": val 1,
						"lss": val 2
					}
				}
			}
		}
	}
}