Friday, October 21, 2016

Cloudera Docker Container 사용하기

source: https://www.cloudera.com/documentation/enterprise/5-6-x/topics/quickstart_docker_container.html

Cloudera Docker Container

docker pull cloudera/quickstart:latest
latest: Pulling from cloudera/quickstart
1d00652ce734: Downloading [==>                        ] 232.5 MB/4.444 GB

Importing the Cloudera QuickStart Image

You can import the Docker image by pulling it from the Docker Hub:

docker pull cloudera/quickstart:latest

You can also download the image from the Cloudera website. After the file is downloaded and on your host, you can import it into Docker:

tar xzf cloudera-quickstart-vm-*-docker.tar.gz
docker import - cloudera/quickstart:latest < cloudera-quickstart-vm-*-docker/*.tar

Running a Cloudera QuickStart Container

To run a container using the image, you must know the name or hash of the image. If you followed the import instructions above, the name is cloudera/quickstart:latest. The hash is also printed in the terminal when you import, or you can look up the hashes of all imported images with:

docker images

Once you know the name or hash of the image, you can run it:

docker run --hostname=quickstart.cloudera --privileged=true -t -i [OPTIONS] [IMAGE] /usr/bin/docker-quickstart

The required flags and other options are described in the following table:

Option	Description
`--hostname=quickstart.cloudera`	Required: Pseudo-distributed configuration assumes this hostname.
`--privileged=true`	Required: For HBase, MySQL-backed Hive metastore, Hue, Oozie, Sentry, and Cloudera Manager.
`-t`	Required: Allocate a pseudoterminal. Once services are started, a Bash shell takes over. This switch starts a terminal emulator to run the services.
`-i`	Required: If you want to use the terminal, either immediately or connect to the terminal later.
`-p 8888`	Recommended: Map the Hue port in the guest to another port on the host.
`-p [PORT]`	Optional: Map any other ports (for example, 7180 for Cloudera Manager, 80 for a guided tutorial).
`-d`	Optional: Run the container in the background.

Use /usr/bin/docker-quickstart to start all CDH services, and then run a Bash shell. You can directly run/bin/bash instead if you want to start services manually.

See Networking for details about port mapping.

Connecting to the Docker Shell

If you do not pass the -d flag to docker run, your terminal automatically attaches to the container.

A container dies when you exit the shell, but you can disconnect and leave the container running by typingCtrl+p followed by Ctrl+q.

If you disconnect from the shell or passed the -d flag on startup, you can connect to the shell later using the following command:

docker attach [CONTAINER HASH]

You can look up the hashes of running containers using the following command:

docker ps

When attaching to a container, you might need to press Enter to see the shell prompt. To disconnect from the terminal without the container exiting, type Ctrl+p followed by Ctrl+q.

Networking

To make a port accessible outside the container, pass the -p flag. Docker maps this port to another port on the host system. You can look up the interface to which it binds and the port number it maps to using the following command:

docker port [CONTAINER HASH] [GUEST PORT]

To interact with the Cloudera QuickStart image from other systems, make sure quickstart.cloudera resolves to the IP address of the machine where the image is running. You might also want to set up port forwarding so that the port you would normally connect to on a real cluster is mapped to the corresponding port.

When you are mapping ports like this, services are not aware and might provide links or other references to specific ports that are no longer available on your client.

Wednesday, May 20, 2015

Hadoop: Cloudera vs Hortonworks

Hadoop Benchmark: Cloudera vs. Hortonworks vs. MapR
Comparing the top Hadoop distributions
The Hadoop Wars: Cloudera and Hortonworks’ Death Match for Mindshare

source L http://www.experfy.com/blog/cloudera-vs-hortonworks-comparing-hadoop-distributions/

Comparing top three Hadoop distributions: Cloudera vs Hortonworks vs MapR

Cloudera has been here for the longest time since the creation of Hadoop. Hortonworks came later. While Cloudera and Hortonworks are 100 percent open source, most versions of MapR come with proprietary modules. Each vendor/distribution has its unique strength and weaknesses, each have certain overlapping features as well. If you are looking to make the most of Hadoop’s immense data processing power, it makes sense in making a comparative study in the top three Hadoop distributions.

Cloudera

Cloudera Inc. was founded by big data geniuses from Facebook, Google, Oracle and Yahoo in 2008. It was the first company to develop and distribute Apache Hadoop-based software and still has the largest user base with most number of clients. Although the core of the distribution is based on Apache Hadoop, it also provides a proprietary Cloudera Management Suite to automate the installation process and provide other services to enhance convenience of users which include reducing deployment time, displaying real time nodes’ count, etc.

Cloudera Overview

Hortonworks

Hortonworks, founded in 2011, has quickly emerged as one of the leading vendors of Hadoop. The distribution provides open source platform based on Apache Hadoop for analysing, storing and managing big data. Hortonworks is the only commercial vendor to distribute complete open source Apache Hadoop without additional proprietary software. Hortonworks’ distribution HDP2.0 can be directly downloaded from their website free of cost and is easy to install. The engineers of Hortonworks are behind most of Hadoop’s recent innovations including Yarn, which is better than MapReduce in the sense that it will enable inclusion of more data processing frameworks.

나의 인생 이야기

Friday, October 21, 2016