mirror of
https://github.com/k3s-io/kubernetes.git
synced 2025-08-06 18:54:06 +00:00
add tl;dr version of Spark README.md
mention the spark cluster is standalone add detailed master & worker instructions add method to get master status add links option for master status add links option for worker status add example use of cluster add source location
This commit is contained in:
parent
37689038d2
commit
31b923c987
173
examples/spark/README.md
Normal file
173
examples/spark/README.md
Normal file
@ -0,0 +1,173 @@
|
||||
# Spark example
|
||||
|
||||
Following this example, you will create a functional [Apache
|
||||
Spark](http://spark.apache.org/) cluster using Kubernetes and
|
||||
[Docker](http://docker.io).
|
||||
|
||||
You will setup a Spark master service and a set of
|
||||
Spark workers using Spark's [standalone mode](http://spark.apache.org/docs/latest/spark-standalone.html).
|
||||
|
||||
For the impatient expert, jump straight to the [tl;dr](#tldr)
|
||||
section.
|
||||
|
||||
### Sources
|
||||
|
||||
Source is freely available at:
|
||||
* Docker image - https://github.com/mattf/docker-spark
|
||||
* Docker Trusted Build - https://registry.hub.docker.com/search?q=mattf/spark
|
||||
|
||||
## Step Zero: Prerequisites
|
||||
|
||||
This example assumes you have a Kubernetes cluster installed and
|
||||
running, and that you have installed the ```kubectl``` command line
|
||||
tool somewhere in your path. Please see the [getting
|
||||
started](../../docs/getting-started-guides) for installation
|
||||
instructions for your platform.
|
||||
|
||||
## Step One: Start your Master service
|
||||
|
||||
The Master service is the master (or head) service for a Spark
|
||||
cluster.
|
||||
|
||||
Use the `examples/spark/spark-master.json` file to create a pod running
|
||||
the Master service.
|
||||
|
||||
```shell
|
||||
$ kubectl create -f examples/spark/spark-master.json
|
||||
```
|
||||
|
||||
Then, use the `examples/spark/spark-master-service.json` file to
|
||||
create a logical service endpoint that Spark workers can use to access
|
||||
the Master pod.
|
||||
|
||||
```shell
|
||||
$ kubectl create -f examples/spark/spark-master-service.json
|
||||
```
|
||||
|
||||
Ensure that the Master service is running and functional.
|
||||
|
||||
### Check to see if Master is running and accessible
|
||||
|
||||
```shell
|
||||
$ kubectl get pods,services
|
||||
POD IP CONTAINER(S) IMAGE(S) HOST LABELS STATUS
|
||||
spark-master 192.168.90.14 spark-master mattf/spark-master 172.18.145.8/172.18.145.8 name=spark-master Running
|
||||
NAME LABELS SELECTOR IP PORT
|
||||
kubernetes component=apiserver,provider=kubernetes <none> 10.254.0.2 443
|
||||
kubernetes-ro component=apiserver,provider=kubernetes <none> 10.254.0.1 80
|
||||
spark-master name=spark-master name=spark-master 10.254.125.166 7077
|
||||
```
|
||||
|
||||
Connect to http://192.168.90.14:8080 to see the status of the master.
|
||||
|
||||
```shell
|
||||
$ links -dump 192.168.90.14:8080
|
||||
[IMG] 1.2.1 Spark Master at spark://spark-master:7077
|
||||
|
||||
* URL: spark://spark-master:7077
|
||||
* Workers: 0
|
||||
* Cores: 0 Total, 0 Used
|
||||
* Memory: 0.0 B Total, 0.0 B Used
|
||||
* Applications: 0 Running, 0 Completed
|
||||
* Drivers: 0 Running, 0 Completed
|
||||
* Status: ALIVE
|
||||
...
|
||||
```
|
||||
|
||||
(Pull requests welcome for an alternative that uses the service IP and
|
||||
port)
|
||||
|
||||
## Step Two: Start your Spark workers
|
||||
|
||||
The Spark workers do the heavy lifting in a Spark cluster. They
|
||||
provide execution resources and data cache capabilities for your
|
||||
program.
|
||||
|
||||
The Spark workers need the Master service to be running.
|
||||
|
||||
Use the `examples/spark/spark-worker-controller.json` file to create a
|
||||
ReplicationController that manages the worker pods.
|
||||
|
||||
```shell
|
||||
$ kubectl create -f examples/spark/spark-worker-controller.json
|
||||
```
|
||||
|
||||
### Check to see if the workers are running
|
||||
|
||||
```shell
|
||||
$ links -dump 192.168.90.14:8080
|
||||
[IMG] 1.2.1 Spark Master at spark://spark-master:7077
|
||||
|
||||
* URL: spark://spark-master:7077
|
||||
* Workers: 3
|
||||
* Cores: 12 Total, 0 Used
|
||||
* Memory: 20.4 GB Total, 0.0 B Used
|
||||
* Applications: 0 Running, 0 Completed
|
||||
* Drivers: 0 Running, 0 Completed
|
||||
* Status: ALIVE
|
||||
|
||||
Workers
|
||||
|
||||
Id Address State Cores Memory
|
||||
4 (0 6.8 GB
|
||||
worker-20150318151745-192.168.75.14-46422 192.168.75.14:46422 ALIVE Used) (0.0 B
|
||||
Used)
|
||||
4 (0 6.8 GB
|
||||
worker-20150318151746-192.168.35.17-53654 192.168.35.17:53654 ALIVE Used) (0.0 B
|
||||
Used)
|
||||
4 (0 6.8 GB
|
||||
worker-20150318151746-192.168.90.17-50490 192.168.90.17:50490 ALIVE Used) (0.0 B
|
||||
Used)
|
||||
...
|
||||
```
|
||||
|
||||
(Pull requests welcome for an alternative that uses the service IP and
|
||||
port)
|
||||
|
||||
## Step Three: Do something with the cluster
|
||||
|
||||
```shell
|
||||
$ kubectl get pods,services
|
||||
POD IP CONTAINER(S) IMAGE(S) HOST LABELS STATUS
|
||||
spark-master 192.168.90.14 spark-master mattf/spark-master 172.18.145.8/172.18.145.8 name=spark-master Running
|
||||
spark-worker-controller-51wgg 192.168.75.14 spark-worker mattf/spark-worker 172.18.145.9/172.18.145.9 name=spark-worker,uses=spark-master Running
|
||||
spark-worker-controller-5v48c 192.168.90.17 spark-worker mattf/spark-worker 172.18.145.8/172.18.145.8 name=spark-worker,uses=spark-master Running
|
||||
spark-worker-controller-ehq23 192.168.35.17 spark-worker mattf/spark-worker 172.18.145.12/172.18.145.12 name=spark-worker,uses=spark-master Running
|
||||
NAME LABELS SELECTOR IP PORT
|
||||
kubernetes component=apiserver,provider=kubernetes <none> 10.254.0.2 443
|
||||
kubernetes-ro component=apiserver,provider=kubernetes <none> 10.254.0.1 80
|
||||
spark-master name=spark-master name=spark-master 10.254.125.166 7077
|
||||
|
||||
$ sudo docker run -it mattf/spark-base sh
|
||||
|
||||
sh-4.2# echo "10.254.125.166 spark-master" >> /etc/hosts
|
||||
|
||||
sh-4.2# export SPARK_LOCAL_HOSTNAME=$(hostname -i)
|
||||
|
||||
sh-4.2# MASTER=spark://spark-master:7077 pyspark
|
||||
Python 2.7.5 (default, Jun 17 2014, 18:11:42)
|
||||
[GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2
|
||||
Type "help", "copyright", "credits" or "license" for more information.
|
||||
Welcome to
|
||||
____ __
|
||||
/ __/__ ___ _____/ /__
|
||||
_\ \/ _ \/ _ `/ __/ '_/
|
||||
/__ / .__/\_,_/_/ /_/\_\ version 1.2.1
|
||||
/_/
|
||||
|
||||
Using Python version 2.7.5 (default, Jun 17 2014 18:11:42)
|
||||
SparkContext available as sc.
|
||||
>>> import socket, resource
|
||||
>>> sc.parallelize(range(1000)).map(lambda x: (socket.gethostname(), resource.getrlimit(resource.RLIMIT_NOFILE))).distinct().collect()
|
||||
[('spark-worker-controller-ehq23', (1048576, 1048576)), ('spark-worker-controller-5v48c', (1048576, 1048576)), ('spark-worker-controller-51wgg', (1048576, 1048576))]
|
||||
```
|
||||
|
||||
## tl;dr
|
||||
|
||||
```kubectl create -f spark-master.json```
|
||||
|
||||
```kubectl create -f spark-master-service.json```
|
||||
|
||||
Make sure the Master Pod is running (use: ```kubectl get pods```).
|
||||
|
||||
```kubectl create -f spark-worker-controller.json```
|
9
examples/spark/spark-master-service.json
Normal file
9
examples/spark/spark-master-service.json
Normal file
@ -0,0 +1,9 @@
|
||||
{
|
||||
"id": "spark-master",
|
||||
"kind": "Service",
|
||||
"apiVersion": "v1beta1",
|
||||
"port": 7077,
|
||||
"containerPort": 7077,
|
||||
"selector": { "name": "spark-master" },
|
||||
"labels": { "name": "spark-master" }
|
||||
}
|
20
examples/spark/spark-master.json
Normal file
20
examples/spark/spark-master.json
Normal file
@ -0,0 +1,20 @@
|
||||
{
|
||||
"id": "spark-master",
|
||||
"kind": "Pod",
|
||||
"apiVersion": "v1beta1",
|
||||
"desiredState": {
|
||||
"manifest": {
|
||||
"version": "v1beta1",
|
||||
"id": "spark-master",
|
||||
"containers": [{
|
||||
"name": "spark-master",
|
||||
"image": "mattf/spark-master",
|
||||
"cpu": 100,
|
||||
"ports": [{ "containerPort": 7077 }]
|
||||
}]
|
||||
}
|
||||
},
|
||||
"labels": {
|
||||
"name": "spark-master"
|
||||
}
|
||||
}
|
28
examples/spark/spark-worker-controller.json
Normal file
28
examples/spark/spark-worker-controller.json
Normal file
@ -0,0 +1,28 @@
|
||||
{
|
||||
"id": "spark-worker-controller",
|
||||
"kind": "ReplicationController",
|
||||
"apiVersion": "v1beta1",
|
||||
"desiredState": {
|
||||
"replicas": 3,
|
||||
"replicaSelector": {"name": "spark-worker"},
|
||||
"podTemplate": {
|
||||
"desiredState": {
|
||||
"manifest": {
|
||||
"version": "v1beta1",
|
||||
"id": "spark-worker-controller",
|
||||
"containers": [{
|
||||
"name": "spark-worker",
|
||||
"image": "mattf/spark-worker",
|
||||
"cpu": 100,
|
||||
"ports": [{"containerPort": 8888, "hostPort": 8888}]
|
||||
}]
|
||||
}
|
||||
},
|
||||
"labels": {
|
||||
"name": "spark-worker",
|
||||
"uses": "spark-master"
|
||||
}
|
||||
}
|
||||
},
|
||||
"labels": {"name": "spark-worker"}
|
||||
}
|
Loading…
Reference in New Issue
Block a user