第三节:Ubuntu 16.04安装Spark¶
一、安装Scala¶
scala 下载与安装¶
wget http://scala-lang.org/files/archive/scala-2.11.12.tgz
tar xcf scala-2.11.12.tgz
sudo mv scala-2.11.12 /usr/local/scala
# 编辑~/.bashrc文件
sudo gedit ~/.bashrc
# 末尾添加Scala配置
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:${SCALA_HOME}/bin
# ~/.bashrc修改生效
source ~/.bashrc
启动scala¶
issue:cat /usr/lib/jvm/java-8-openjdk-amd64//release:没有那个文件或目录
solution:更换openjdk为oracle jdk
二、在Ubuntu 16.04上正确安装Oracle Java¶
1. 将webupd8team存储库添加到apt¶
sudo apt-get install python-software-properties
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
- 注:运行第二个命令出现No module named “apt_pkg”错误
- 解决方法:由于更换python版本3.6,所以出现以下问题,依次运行以下代码即可解决
sudo apt install python3-apt
cd /usr/lib/python3/dist-packages
sudo cp apt_pkg.cpython-35m-x86_64-linux-gnu.so apt_pkg.cpython-36m-x86_64-linux-gnu.so
2.安装 Java¶
sudo apt-get install oracle-java8-installer
5.自此完全正常¶
三、安装&配置Spark¶
1.下载与安装¶
# 下载
wget http://mirrors.hust.edu.cn/apache/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
# 解压
tar zxf spark-2.2.1-bin-hadoop2.7.tgz
# 目录移动
sudo mv spark-2.2.1-bin-hadoop2.7 /usr/local/spark
# 编辑~/.bashrc
sudo gedit ~/.bashrc
# 配置Spark--复制以下两行至~/.bashrc文件中
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:${SPARK_HOME}/bin
# ~/.bashrc配置生效
source ~/.bashrc
# 进入spark-shell交互界面
spark-shell
2.设置spark-shell显示信息¶
# 切换到spark配置文件目录
cd /us/local/spark/conf
# 复制log4j模板文件到log4j.properties
cp log4j.properties.template log4j.properties
# 编辑log4j.properties---将log4j.rootCategory=INFO,console改为log4j.rootCategory=WARN,console
sudo gedit log4j.properties
# 启动spark-shell
此时会发现信息少了很多,只列出重要部分的信息
四、构建Spark Standalone Cluster执行环境¶
# 复制模板文件来创建spark-env.sh
cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh
# 编辑spark-env.sh
sudo gedit /usr/local/spark/conf/spark-env.sh
# 添加以下内容
export SPARK_MASTER_IP=master
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=800m
export SPARK_WORKER_INSTANCES=2
# 将master的spark程序复制到data1、data2、data3,以下五行重复,注意更换data服务器
ssh data1 # ssh连接至data1
sudo mkdir /usr/local/spark # 连接成功后,创建spark目录
sudo chown hduser:hadoop /usr/local/spark # 更换所有者为hduser
exit # 注销data1
sudo scp -r /usr/local/spark hduser@data1:/usr/local
五、在Spark Standalone运行spark-shell¶
# 启动Spark Standalone Cluster
/usr/local/spark/sbin/start-all.sh
# 在Spark Standalone运行spark-shell
spark-shell --master spark://master:7077
http://master:8080/
### 4.检查此命令的输出,以验证是否正确