本文共 5542 字,大约阅读时间需要 18 分钟。
当我们按照搭建了hadoop以后,发现这是一个空的hadoop,只有YARN,MapReduce,HDFS,而这些实际上我们一般不会直接使用,而是需要另外部署Hadoop的其他组件,来辅助使用。比如我们把数据存储到了hdfs,都是文件格式,用起来肯定不方便,用HIVE把数据从HDFS映射成表结构,直接用sql语句即可操作数据。另外针对分布式数据计算算法MapReduce,需要直接写MapReduce程序,比较复杂,此时使用Hive,就可以通过写SQL语句,来实现MapReduce的功能实现。
注意:首先需要注意的是让Hadoop完全分布式环境跑起来,然后只需要在namenode节点安装hive即可!
# cd /opt # hive包的目录放到服务器的opt目录下
# tar -xzvf apache-hive-2.1.1-bin.tar.gz # 将压缩包进行解压 # mv apache-hive-2.1.1-bin hive2.1.1 #更换hive的目录名为hive2.1.1
# vim /etc/profile # 修改环境变量配置文件 export JAVA_HOME=/opt/jdk1.8 export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$PATH:$JAVA_HOME/bin export HADOOP_HOME=/opt/hadoop2.6.0 export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH export HIVE_HOME=/opt/hive2.1.1 export HIVE_CONF_DIR=$HIVE_HOME/conf export CLASSPATH=.:$HIVE_HOME/lib:$CLASSPATH export PATH=$PATH:$HIVE_HOME/bin
# source /etc/profile #使配置生效
# cd /opt/hive2.1.1/conf/ # cp hive-default.xml.template hive-site.xml
注意:我们需要在HDFS创建/user/hive/warehouse,/tmp/hive这两个目录,因为在修改hive-site.xml配置文件的时候需要使用该目录!
# hdfs dfs -mkdir -p /user/hive/warehouse # 创建warehouse目录 # hdfs dfs -chmod 777 /user/hive/warehouse # 给warehouse目录进行赋权 # hdfs dfs -mkdir -p /tmp/hive/ # 创建warehouse目录 # hdfs dfs -chmod 777 /tmp/hive # 给warehouse目录进行赋权
将${system:java.io.tmpdir}全部替换为/opt/hive2.1.1/tmp/【该目录需要自己手动建】,将${system:user.name}都替换为root
hive.exec.local.scratchdir ${system:java.io.tmpdir}/${system:user.name} Local scratch space for Hive jobs hive.downloaded.resources.dir ${system:java.io.tmpdir}/${hive.session.id}_resources Temporary local directory for added resources in the remote file system. hive.querylog.location ${system:java.io.tmpdir}/${system:user.name} Location of Hive run time structured log file hive.server2.logging.operation.log.location ${system:java.io.tmpdir}/${system:user.name}/operation_logs Top level directory where operation logs are stored if logging functionality is enabled
替换后
hive.exec.local.scratchdir /opt/hive2.1.1/tmp/root Local scratch space for Hive jobs hive.downloaded.resources.dir /opt/hive2.1.1/tmp/${hive.session.id}_resources Temporary local directory for added resources in the remote file system. hive.querylog.location /opt/hive2.1.1/tmp/root Location of Hive run time structured log file hive.server2.logging.operation.log.location /opt/hive2.1.1/tmp/root/operation_logs Top level directory where operation logs are stored if logging functionality is enabled
hive-site.xml中相关元数据信息配制: javax.jdo.option.ConnectionDriverName,将对应的value修改为MySQL驱动类路径; javax.jdo.option.ConnectionURL,将对应的value修改为MySQL的地址; javax.jdo.option.ConnectionUserName,将对应的value修改为MySQL数据库登录名; javax.jdo.option.ConnectionPassword,将对应的value修改为MySQL数据库的登录密码:
javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver Driver class name for a JDBC metastore javax.jdo.option.ConnectionURL jdbc:mysql://192.168.210.70:3306/hive?createDatabaseIfNotExist=true JDBC connect string for a JDBC metastore. To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. For example, jdbc:postgresql://myhost/db?ssl=true for postgres database. javax.jdo.option.ConnectionUserName root Username to use against metastore database javax.jdo.option.ConnectionPassword 11111 password to use against metastore database hive.metastore.schema.verification false Enforce metastore schema version consistency. True: Verify that version information stored in metastore matches with one from Hive jars. Also disable automatic schema migration attempt. Users are required to manully migrate schema after Hive upgrade which ensures proper metastore schema migration. (Default) False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
下载后,上传到/opt/hive2.1.1/lib目录下
# cd /opt/hive2.1.1/conf # cp hive-env.sh.template hive-env.sh 打开hive-env.sh配置并且添加以下内容:export HADOOP_HOME=/opt/hadoop2.6.0export HIVE_CONF_DIR=/opt/hive2.1.1/confexport HIVE_AUX_JARS_PATH=/opt/hive2.1.1/lib
# cd /opt/hive2.1.1/bin # schematool -initSchema -dbType mysql # 对数据库进行初始化