一、背景数据介绍
1. WiFi有哪些数据?
手机号
机构
机构
机构
网页快照
论坛帖子
微博
邮件
IM聊天
表单数据
APP使用
2. WiFi价值
客户体验:方便客户、基础设施
客户数据:精准营销、获取客户上网行为、获取客户信息、客户接触渠道
3. WiFi数据获取
Wi-Fi 网络可以捕获附近智能手机的 IMSI 号码,无线跟踪并监控用户的根源在于智能手机(包括 Android 和 iOS 设备)连接 Wi-Fi 网络的方式。
在大多数现代移动操作系统中有两种广泛实现的协议:
可扩展认证协议(EAP)
认证和密钥协商(AKA)协议
这些协议允许智能手机通过自身设备的 IMSI 号码切换登录到已知的 Wi-Fi 网络,实现 WiFi 网络自动连接而无需所有者交互。
4. wifi数据应用
画像系统
5. 数据架构
6. 数据结构
(1) 文件命名
数据类型_来源_UUID.txt
如BASE_SOURCE_UUID.txt
定一套字段标准 ,类型标准
(2) 字段
(3) 通用字段
参数1 | 参数2 | 参数3 | 参数4 |
---|---|---|---|
imei | imei号,手机唯一识别码 | 手机IMEI码由15-17位数字组成 | |
imsi | IMSI,SIM卡唯一识别码 | 460011418603055 | 14-15位数字 |
longitude | 经度 | 精确到小数点6位 | |
latitude | 纬度 | 精确到小数点6位 | |
phone_mac | 手机MAC | 格式需要统一(清洗)aa-aa-aa-aa-aa-aa(范围1-9,a-f) | |
device_mac | 采集设备MAC | 格式需要统一(清洗)aa-aa-aa-aa-aa-aa(范围任意数字加字母) | |
device_number | 采集设备号 | ||
collect_time | collect_time |
微信数据(wechat)
参数1 | 参数2 | 参数3 | 参数4 |
---|---|---|---|
username | 微信昵称 | ||
phone | 手机号 | ||
object_username | 对方微信号 | ||
send_message | 发送内容(不能破解) | ||
accept_message | 接收内容(不能破解) | ||
message_time | 通信时间 |
邮箱数据(Mail)
参数1 | 参数2 | 参数3 | 参数4 |
---|---|---|---|
send_mail | 发送邮箱 | ||
send_time | 发送时间 | ||
accept_mail | 接收邮箱 | ||
accept_time | 接收时间 | ||
mail_content | 发送内容 | ||
mail_type | 发送还是接收 | send accept |
搜索数据(Search)
参数1 | 参数2 | 参数3 | 参数4 |
---|---|---|---|
search_content | 搜索内容 | ||
search_url | 搜索URL | ||
search_type | 搜索引擎 | ||
search_time | 搜索时间 |
基础数据(Base)
参数1 | 参数2 | 参数3 | 参数4 |
---|---|---|---|
name | 姓名 | ||
is_marry | 是否已婚 | ||
phone | 手机号 | ||
address | 户籍所在地 | ||
address_new | 现在居住地址 | ||
birthday | 出生日期 | ||
car_number | 车牌号 | ||
idcard | 身份证 |
问题:数据结构,数据字段如何确定?
根据实际的需求自己确定。
二.基础架构搭建
1、创建Maven父项
总的pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata2</artifactId>
<packaging>pom</packaging>
<version>1.0-SNAPSHOT</version>
<modules>
<module>xz_bigdata_common</module>
<module>xz_bigdata_es</module>
<module>xz_bigdata_flume</module>
<module>xz_bigdata_hbase</module>
<module>xz_bigdata_kafka</module>
<module>xz_bigdata_redis</module>
<module>xz_bigdata_resources</module>
<module>xz_bigdata_spark</module>
</modules>
<name>xz_bigdata2</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<cdh.version>cdh5.14.0</cdh.version>
<junit.version>4.12</junit.version>
<org.slf4j.version>1.7.5</org.slf4j.version>
<zookeeper.version>3.4.5</zookeeper.version>
<scala.version>2.10.5</scala.version>
</properties>
<repositories>
<repository>
<id>Akka repository</id>
<url>https://repo.akka.io/releases</url>
</repository>
<!--cloudera依赖-->
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<!--日志依赖-->
<dependencies>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>${org.slf4j.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
</plugins>
</build>
</project>
2、项目整体结构
3、创建子模块
选中xz_bigdata2,右键选择Module,新建maven子模块,上面图中的那些模块都是这样创建的。
注意:开发时使用jdk1.8以上版本,里面使用了jdk1.8特有的内容,低版本开发是报错的,使用jdk1.8方便开发。
ctrl+shift+alt+s:打开Project Structure里面可以进行操作。
ctrl+alt+s:打开Settings,可以配置本地Maven(在Build,Execution,Deployment下面的Build Tools下面的Maven配置自己的本地Maven仓库路径)。
Settings里面还可以看见之前说的Plugins,安装插件,Maven Helper以及后面的Scala插件都可以这里安装。
三、Common开发
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>xz_bigdata2</artifactId>
<groupId>com.hsiehchou</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>xz_bigdata_common</artifactId>
<name>xz_bigdata_common</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<ant.version>1.9.1</ant.version>
<jaxen.version>1.1.6</jaxen.version>
<guava.version>12.0.1</guava.version>
<dom4j.version>1.6.1</dom4j.version>
<fastjson.version>1.2.5</fastjson.version>
<disruptor.version>3.3.6</disruptor.version>
<org.slf4j.version>1.7.5</org.slf4j.version>
<commons.io.version>2.4</commons.io.version>
<httpclient.version>4.2.5</httpclient.version>
<commons.exec.version>1.3</commons.exec.version>
<commons.lang.version>2.4</commons.lang.version>
<commons-vfs2.version>2.1</commons-vfs2.version>
<commons.math3.version>3.4.1</commons.math3.version>
<commons.logging.version>1.2</commons.logging.version>
<commons-httpclient.version>3.1</commons-httpclient.version>
<commons.collections4.version>4.1</commons.collections4.version>
<commons.configuration.version>1.6</commons.configuration.version>
<mysql.connector.version>5.1.46</mysql.connector.version>
<commons-dbutils.version>1.6</commons-dbutils.version>
</properties>
<dependencies>
<dependency>
<groupId>commons-dbutils</groupId>
<artifactId>commons-dbutils</artifactId>
<version>${commons-dbutils.version}</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.46</version>
</dependency>
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_resources</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>${org.slf4j.version}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>${org.slf4j.version}</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>${commons.io.version}</version>
</dependency>
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>${commons.lang.version}</version>
</dependency>
<dependency>
<groupId>commons-configuration</groupId>
<artifactId>commons-configuration</artifactId>
<version>${commons.configuration.version}</version>
</dependency>
<dependency>
<groupId>dom4j</groupId>
<artifactId>dom4j</artifactId>
<version>${dom4j.version}</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>${fastjson.version}</version>
</dependency>
<!-- <dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>-->
</dependencies>
</project>
1、config/ConfigUtil.java—配置文件读取
package com.hsiehchou.common.config;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
public class ConfigUtil {
private static Logger LOG = LoggerFactory.getLogger(ConfigUtil.class);
private static ConfigUtil configUtil;
public static ConfigUtil getInstance(){
if(configUtil == null){
configUtil = new ConfigUtil();
}
return configUtil;
}
public Properties getProperties(String path){
Properties properties = new Properties();
try {
LOG.info("开始加载配置文件" + path);
//流式读取配置文件
InputStream insss = this.getClass().getClassLoader().getResourceAsStream(path);
properties = new Properties();
properties.load(insss);
} catch (IOException e) {
LOG.info("加载配置文件" + path + "失败");
LOG.error(null,e);
}
LOG.info("加载配置文件" + path + "成功");
System.out.println("文件内容:"+properties);
return properties;
}
public static void main(String[] args) {
ConfigUtil instance = ConfigUtil.getInstance();
Properties properties = instance.getProperties("common/datatype.properties");
//Properties properties = instance.getProperties("spark/relation.properties");
// properties.get("relationfield");
System.out.println(properties);
}
}
2、config/JsonReader.java
package com.hsiehchou.common.config;
import org.apache.commons.io.FileUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.File;
public class JsonReader {
private static Logger LOG = LoggerFactory.getLogger(JsonReader.class);
public static String readJson(String json_path){
JsonReader jsonReader = new JsonReader();
return jsonReader.getJson(json_path);
}
private String getJson(String json_path){
String jsonStr = "";
try {
String path = getClass().getClassLoader().getResource(json_path).toString();
path = path.replace("\\", "/");
if (path.contains(":")) {
path = path.replace("file:/","");
}
jsonStr = FileUtils.readFileToString(new File(path), "UTF-8");
LOG.error("读取json文件{}成功",path);
} catch (Exception e) {
LOG.error("读取json文件失败",e);
}
return jsonStr;
}
}
3、adjuster/Adjuster.java—数据调整接口
package com.hsiehchou.common.adjuster;
/**
* 数据调整接口
*/
public interface Adjuster<T, E> {
E doAdjust(T data);
}
4、adjuster/StringAdjuster.java
package com.hsiehchou.common.adjuster;
public abstract class StringAdjuster<E> implements Adjuster<String, E> {}
5、file/FileCommon.java
package com.hsiehchou.common.file;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.net.URL;
import java.util.List;
public class FileCommon {
private FileCommon(){}
/**
* 判断文件是否存在
* @param name
* @return
*/
public static boolean exist(String name){
return exist(new File(name));
}
public static boolean exist(File file){
return file.exists();
}
/**
* 创建文件
* @param file
* @return
* @throws IOException
*/
public static boolean createFile(String file) throws IOException {
return createFile(new File(file));
}
public static boolean createFile(File file) throws IOException {
if(!file.exists()){
if(file.isDirectory()){
return file.mkdirs();
}else{
File parentDir = file.getParentFile();
if(!parentDir.exists()) {
if (parentDir.mkdirs()) {
return file.createNewFile();
}
}else{
return file.createNewFile();
}
}
}
return true;
}
/**
* 读取文件内容 按行
* @param file
* @return
* @throws IOException
*/
public static List<String> readLines(String file) throws IOException{
return readLines(new File(file), "UTF-8");
}
public static List<String> readLines(String file, String encording) throws IOException{
return readLines(new File(file), encording);
}
public static List<String> readLines(File file, String encording) throws IOException {
List<String> lines = null;
if(FileCommon.exist(file)) {
FileInputStream fileInputStream = new FileInputStream(file);
lines = IOUtils.readLines(fileInputStream, encording);
fileInputStream.close();
}
return lines;
}
/**
* 获取文件前缀
* @param fileName
* @return
*/
public static String getPrefix(String fileName){
String prefix = fileName;
int pos = fileName.lastIndexOf(".");
if (pos != -1){
prefix = fileName.substring(0,pos);
}
return prefix;
}
/**
* 获取文件名后缀
* @param fileName
* @return
*/
public static String getFilePostfix(String fileName){
String filePostfix = fileName.substring(fileName.lastIndexOf(".") + 1);
return filePostfix.toLowerCase();
}
/**
* 删除文件
* @param filePath
* @return
*/
public static boolean delFile(String filePath) {
boolean flag = false;
File file = new File(filePath);
if (file.isFile() && file.exists()) {
flag = file.delete();
}
return flag;
}
/**
* 移动文件
* @param oldPath
* @param newPath
* @return
*/
public static boolean mvFile(String oldPath,String newPath){
boolean flag = false;
File oldfile = new File(oldPath);
File newfile = new File(newPath);
if(oldfile.isFile() && oldfile.exists()){
if(newfile.exists()){
delFile(newfile.getAbsolutePath());
}
flag = oldfile.renameTo(newfile);
}
return flag;
}
/**
* 删除目录
* @param dir
* @return
*/
public static boolean deleteDir(File dir){
if (dir.isDirectory()) {
String[] children = dir.list();
//递归删除目录中的子目录下
if(children!=null){
for (int i=0; i<children.length; i++) {
boolean success = deleteDir(new File(dir, children[i]));
if (!success) {
return false;
}
}
}
}
// 目录此时为空,可以删除
return dir.delete();
}
//递归建立目录,解压缩相关类中使用
public static void mkdirs(File file) {
File parent = file.getParentFile();
if (parent != null && (!parent.exists())) {
parent.mkdirs();
}
}
public static String getJarFilePathByClass(String clazz) throws ClassNotFoundException {
return getJarFilePathByClass(Class.forName(clazz));
}
public static String getJarFileDirByClass(String clazz) throws ClassNotFoundException {
return getJarFileDirByClass(Class.forName(clazz));
}
public static String getJarFilePathByClass(Class<?> clazz){
return new File(clazz.getProtectionDomain().getCodeSource().getLocation().getFile()).getAbsolutePath();
}
public static String getJarFileDirByClass(Class<?> clazz){
return new File(getJarFilePathByClass(clazz)).getParent();
}
public static String getAbstractPath(String abstractPath) throws Exception{
URL url = FileCommon.class.getClassLoader().getResource(abstractPath);
System.out.println("配置文件路径为" + url);
File file = new File(url.getFile());
String content= FileUtils.readFileToString(file,"UTF-8");
return content;
}
public static String getAbstractPath111(String abstractPath) throws Exception{
File file = new File(abstractPath);
String content= FileUtils.readFileToString(file,"UTF-8");
return content;
}
}
6、filter—数据过滤顶层接口
package com.hsiehchou.common.filter;
/**
* 数据过滤顶层接口
*/
public interface Filter<T> {
boolean filter(T obj);
}
7、net/HttpRequest.java
package com.hsiehchou.common.net;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.net.URLConnection;
import java.net.URLEncoder;
import java.util.Map;
public class HttpRequest {
private static final Logger LOG = LoggerFactory.getLogger(HttpRequest.class);
/**
* 向指定URL发送GET方法的请求
* @param url 发送请求的URL
* @param param 请求参数,请求参数应该是 name1=value1&name2=value2 的形式。
* @return URL 所代表远程资源的响应结果
*/
public static String sendGet(String url, String param) {
String result = "";
BufferedReader in = null;
try {
String urlNameString = url + "?" + param;
URL realUrl = new URL(urlNameString);
// 打开和URL之间的连接
URLConnection connection = realUrl.openConnection();
// 设置通用的请求属性
connection.setRequestProperty("accept", "*/*");
connection.setRequestProperty("connection", "Keep-Alive");
connection.setRequestProperty("user-agent",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");
// 建立实际的连接
connection.connect();
// 获取所有响应头字段
//Map<String, List<String>> map = connection.getHeaderFields();
// 遍历所有的响应头字段
// 定义 BufferedReader输入流来读取URL的响应
in = new BufferedReader(new InputStreamReader(connection.getInputStream(),"UTF-8"));
String line;
while ((line = in.readLine()) != null) {
result += line;
}
} catch (Exception e) {
LOG.info("发送GET请求出现异常!" + (url+param));
System.out.println("发送GET请求出现异常!" + e);
e.printStackTrace();
}
// 使用finally块来关闭输入流
finally {
try {
if (in != null) {
in.close();
}
} catch (Exception e2) {
e2.printStackTrace();
}
}
return result;
}
/**
* 向指定URL发送GET方法的请求
* @param url 发送请求的URL
* @param param 请求参数,请求参数应该是 name1=value1&name2=value2 的形式。
* @return URL 所代表远程资源的响应结果
*/
public static String sendGet(String url, String param,String authorization) {
String result = "";
BufferedReader in = null;
try {
String urlNameString = url + "?" + param;
URL realUrl = new URL(urlNameString);
// 打开和URL之间的连接
URLConnection connection = realUrl.openConnection();
// 设置通用的请求属性
connection.setRequestProperty("accept", "*/*");
connection.setRequestProperty("connection", "Keep-Alive");
connection.setRequestProperty("Authorization", authorization);
connection.setRequestProperty("user-agent",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");
// 建立实际的连接
connection.connect();
// 获取所有响应头字段
connection.getHeaderFields();
// 遍历所有的响应头字段
/* for (String key : map.keySet()) {
System.out.println(key + "--->" + map.get(key));
}*/
// 定义 BufferedReader输入流来读取URL的响应
in = new BufferedReader(new InputStreamReader(
connection.getInputStream(),"UTF-8"));
String line;
while ((line = in.readLine()) != null) {
result += line;
}
} catch (Exception e) {
LOG.info("发送POST请求出现异常!" + (url+param));
System.out.println("发送POST请求出现异常!" + e);
e.printStackTrace();
}
// 使用finally块来关闭输入流
finally {
try {
if (in != null) {
in.close();
}
} catch (Exception e2) {
e2.printStackTrace();
}
}
return result;
}
public static void main(String[] args) throws Exception{
}
/**
* 向指定 URL 发送POST方法的请求
* @param url 发送请求的 URL
* @param param 请求参数,请求参数应该是 name1=value1&name2=value2 的形式。
* @return 所代表远程资源的响应结果
*/
public static String sendPost(String url, String param) {
PrintWriter out = null;
BufferedReader in = null;
String result = "";
try {
URL realUrl = new URL(url);
// 打开和URL之间的连接
URLConnection conn = realUrl.openConnection();
// 设置通用的请求属性
conn.setRequestProperty("Content-Type","application/json");
//conn.setInstanceFollowRedirects(false);
// conn.setRequestProperty("Content-Type","application/x-www-form-urlencoded");
conn.setRequestProperty("accept", "*/*");
conn.setRequestProperty("connection", "Keep-Alive");
conn.setRequestProperty("user-agent",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");
// 发送POST请求必须设置如下两行
conn.setReadTimeout(30000);
conn.setDoOutput(true);
conn.setDoInput(true);
// 获取URLConnection对象对应的输出流
out = new PrintWriter(conn.getOutputStream());
// 发送请求参数
out.print(param);
// flush输出流的缓冲
out.flush();
// 定义BufferedReader输入流来读取URL的响应
InputStream inputStream = conn.getInputStream();
in = new BufferedReader(new InputStreamReader(inputStream,"UTF-8"));
String line;
while ((line = in.readLine()) != null) {
result += line;
}
}
catch (IOException e) {
LOG.info("发送POST请求出现异常!" + (url+param),e);
}
//使用finally块来关闭输出流、输入流
finally{
try{
if(out!=null){
out.close();
}
if(in!=null){
in.close();
}
}
catch(IOException ex){
ex.printStackTrace();
}
}
return result;
}
/*
* params 填写的URL的参数 encode 字节编码
*/
public static String sendPostMessage(String url1,Map<String,Object> params){
String response = null;
Reader in = null;
try {
//访问准备
URL url = new URL(url1);
//开始访问
StringBuilder postData = new StringBuilder();
for (Map.Entry<String,Object> param : params.entrySet()) {
if (postData.length() != 0) postData.append('&');
postData.append(URLEncoder.encode(param.getKey(), "UTF-8"));
postData.append('=');
postData.append(URLEncoder.encode(String.valueOf(param.getValue()), "UTF-8"));
}
byte[] postDataBytes = postData.toString().getBytes("UTF-8");
URLConnection conn = url.openConnection();
//URLConnection conn = url.openConnection();
//conn.setRequestMethod("POST");
//conn.setInstanceFollowRedirects(false);
//conn.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
conn.setRequestProperty("Content-Type", "application/json");
conn.setRequestProperty("Content-Length", String.valueOf(postDataBytes.length));
conn.setDoOutput(true);
conn.getOutputStream().write(postDataBytes);
in = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));
StringBuilder sb = new StringBuilder();
for (int c; (c = in.read()) >= 0;)
sb.append((char)c);
response = sb.toString();
//System.out.println(response);
} catch (IOException e) {
LOG.error(null,e);
}finally {
if(in != null){
try {
in.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return response;
}
/**
* 向指定 URL 发送POST方法的请求
* @param url 发送请求的 URL
* @param param 请求参数,请求参数应该是 name1=value1&name2=value2 的形式。
* @return 所代表远程资源的响应结果
*/
public static void sendPostWithoutReturn(String url, String param) {
PrintWriter out = null;
BufferedReader in = null;
String result = "";
try {
URL realUrl = new URL(url);
// 打开和URL之间的连接
HttpURLConnection conn = (HttpURLConnection )realUrl.openConnection();
// 设置通用的请求属性
conn.setRequestProperty("Content-Type","application/json");
conn.setRequestProperty("accept", "*/*");
conn.setRequestProperty("connection", "Keep-Alive");
conn.setRequestProperty("user-agent",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");
//根据需求设置读超时的时间
conn.setReadTimeout(1000);
// 发送POST请求必须设置如下两行
conn.setDoOutput(true);
conn.setDoInput(true);
// 获取URLConnection对象对应的输出流
out = new PrintWriter(conn.getOutputStream());
// 发送请求参数
out.print(param);
// flush输出流的缓冲
out.flush();
// 定义BufferedReader输入流来读取URL的响应
if (conn.getResponseCode() == 200) {
System.out.println("连接成功,传送数据...");
} else {
System.out.println("连接失败,错误代码:"+conn.getResponseCode());
}
}
catch (IOException e) {
LOG.info("发送POST请求出现异常!" + (url+param),e);
}
//使用finally块来关闭输出流、输入流
finally{
try{
if(out!=null){
out.close();
}
in.close();
}
catch(Exception ex){
ex.printStackTrace();
}
}
}
}
8、netb/db/DBCommon—mysql的连接、关闭基础类
package com.hsiehchou.common.netb.db;
import com.hsiehchou.common.config.ConfigUtil;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.sql.*;
import java.util.Properties;
public class DBCommon {
private static Logger LOG = LoggerFactory.getLogger(DBCommon.class);
private static String MYSQL_PATH = "common/mysql.properties";
private static Properties properties = ConfigUtil.getInstance().getProperties(MYSQL_PATH);
private static Connection conn ;
private DBCommon(){}
public static void main(String[] args) {
System.out.println(properties);
Connection xz_bigdata = DBCommon.getConn("test");
System.out.println(xz_bigdata);
}
//TODO 配置文件
private static final String JDBC_DRIVER = "com.mysql.jdbc.Driver";
private static final String USER_NAME = properties.getProperty("user");
private static final String PASSWORD = properties.getProperty("password");
private static final String IP = properties.getProperty("db_ip");
private static final String PORT = properties.getProperty("db_port");
private static final String DB_CONFIG = "?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull&autoReconnect=true&failOverReadOnly=false";
static {
try {
Class.forName(JDBC_DRIVER);
} catch (ClassNotFoundException e) {
LOG.error(null, e);
}
}
/**
* 获取数据库连接
* @param dbName
* @return
*/
public static Connection getConn(String dbName) {
Connection conn = null;
String connstring = "jdbc:mysql://"+IP+":"+PORT+"/"+dbName+DB_CONFIG;
try {
conn = DriverManager.getConnection(connstring, USER_NAME, PASSWORD);
} catch (SQLException e) {
e.printStackTrace();
LOG.error(null, e);
}
return conn;
}
/**
* @param url eg:"jdbc:oracle:thin:@172.16.1.111:1521:d406"
* @param driver eg:"oracle.jdbc.driver.OracleDriver"
* @param user eg:"ucase"
* @param password eg:"ucase123"
* @return
* @throws ClassNotFoundException
* @throws SQLException
*/
public static Connection getConn(String url, String driver, String user,
String password) throws ClassNotFoundException, SQLException{
Class.forName(driver);
conn = DriverManager.getConnection(url, user, password);
return conn;
}
public static void close(Connection conn){
try {
if( conn != null ){
conn.close();
}
} catch (SQLException e) {
LOG.error(null,e);
}
}
public static void close(Statement statement){
try {
if( statement != null ){
statement.close();
}
} catch (SQLException e) {
LOG.error(null,e);
}
}
public static void close(Connection conn,PreparedStatement statement){
try {
if( conn != null ){
conn.close();
}
if( statement != null ){
statement.close();
}
} catch (SQLException e) {
LOG.error(null,e);
}
}
public static void close(Connection conn,Statement statement,ResultSet resultSet) throws SQLException{
if( resultSet != null ){
resultSet.close();
}
if( statement != null ){
statement.close();
}
if( conn != null ){
conn.close();
}
}
}
9、project/datatype/DataTypeProperties.java
package com.hsiehchou.common.project.datatype;
import com.hsiehchou.common.config.ConfigUtil;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.*;
public class DataTypeProperties {
private static final Logger logger = LoggerFactory.getLogger(DataTypeProperties.class);
private static final String DATA_PATH = "common/datatype.properties";
public static Map<String,ArrayList<String>> dataTypeMap = null;
static {
Properties properties = ConfigUtil.getInstance().getProperties(DATA_PATH);
dataTypeMap = new HashMap<>();
Set<Object> keys = properties.keySet();
keys.forEach(key->{
String[] split = properties.getProperty(key.toString()).split(",");
dataTypeMap.put(key.toString(),new ArrayList<>(Arrays.asList(split)));
});
}
public static void main(String[] args) {
Map<String, ArrayList<String>> dataTypeMap = DataTypeProperties.dataTypeMap;
System.out.println(dataTypeMap.toString());
}
}
10、regex/Validation.java—验证工具类
package com.hsiehchou.common.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* 验证工具类
*/
public class Validation {
// ------------------常量定义
/**
* Email正则表达式=
* "^([a-z0-9A-Z]+[-|\\.]?)+[a-z0-9A-Z]@([a-z0-9A-Z]+(-[a-z0-9A-Z]+)?\\.)+[a-zA-Z]{2,}$"
* ;
*/
// public static final String EMAIL =
// "^([a-z0-9A-Z]+[-|\\.]?)+[a-z0-9A-Z]@([a-z0-9A-Z]+(-[a-z0-9A-Z]+)?\\.)+[a-zA-Z]{2,}$";;
public static final String EMAIL = "\\w+(\\.\\w+)*@\\w+(\\.\\w+)+";
/**
* 电话号码正则表达式=
* (^(\d{2,4}[-_-—]?)?\d{3,8}([-_-—]?\d{3,8})?([-_-—]?\d{1,7})?$)|
* (^0?1[35]\d{9}$)
*/
public static final String PHONE = "(^(\\d{2,4}[-_-—]?)?\\d{3,8}([-_-—]?\\d{3,8})?([-_-—]?\\d{1,7})?$)|(^0?1[35]\\d{9}$)";
/**
* 手机号码正则表达式=^(13[0-9]|15[0-9]|18[0-9])\d{8}$
*/
public static final String MOBILE = "^((13[0-9])|(14[5-7])|(15[^4])|(17[0-8])|(18[0-9]))\\d{8}$";
/**
* Integer正则表达式 ^-?(([1-9]\d*$)|0)
*/
public static final String INTEGER = "^-?(([1-9]\\d*$)|0)";
/**
* 正整数正则表达式 >=0 ^[1-9]\d*|0$
*/
public static final String INTEGER_NEGATIVE = "^[1-9]\\d*|0$";
/**
* 负整数正则表达式 <=0 ^-[1-9]\d*|0$
*/
public static final String INTEGER_POSITIVE = "^-[1-9]\\d*|0$";
/**
* Double正则表达式 ^-?([1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0)$
*/
public static final String DOUBLE = "^-?([1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*|0?\\.0+|0)$";
/**
* 正Double正则表达式 >=0 ^[1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0$
*/
public static final String DOUBLE_NEGATIVE = "^[1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*|0?\\.0+|0$";
/**
* 负Double正则表达式 <= 0 ^(-([1-9]\d*\.\d*|0\.\d*[1-9]\d*))|0?\.0+|0$
*/
public static final String DOUBLE_POSITIVE = "^(-([1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*))|0?\\.0+|0$";
/**
* 年龄正则表达式 ^(?:[1-9][0-9]?|1[01][0-9]|120)$ 匹配0-120岁
*/
public static final String AGE = "^(?:[1-9][0-9]?|1[01][0-9]|120)$";
/**
* 邮编正则表达式 [0-9]\d{5}(?!\d) 国内6位邮编
*/
public static final String CODE = "[0-9]\\d{5}(?!\\d)";
/**
* 匹配由数字、26个英文字母或者下划线组成的字符串 ^\w+$
*/
public static final String STR_ENG_NUM_ = "^\\w+$";
/**
* 匹配由数字和26个英文字母组成的字符串 ^[A-Za-z0-9]+$
*/
public static final String STR_ENG_NUM = "^[A-Za-z0-9]+";
/**
* 匹配由26个英文字母组成的字符串 ^[A-Za-z]+$
*/
public static final String STR_ENG = "^[A-Za-z]+$";
/**
* 过滤特殊字符串正则 regEx=
* "[`~!@#$%^&*()+=|{}':;',\\[\\].<>/?~!@#¥%……&*()——+|{}【】‘;:”“’。,、?]";
*/
public static final String STR_SPECIAL = "[`~!@#$%^&*()+=|{}':;',\\[\\].<>/?~!@#¥%……&*()——+|{}【】‘;:”“’。,、?]";
/***
* 日期正则 支持: YYYY-MM-DD YYYY/MM/DD YYYY_MM_DD YYYYMMDD YYYY.MM.DD的形式
*/
public static final String DATE_ALL = "((^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(10|12|0?[13578])([-\\/\\._]?)(3[01]|[12][0-9]|0?[1-9])$)"
+ "|(^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(11|0?[469])([-\\/\\._]?)(30|[12][0-9]|0?[1-9])$)"
+ "|(^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(0?2)([-\\/\\._]?)(2[0-8]|1[0-9]|0?[1-9])$)|(^([2468][048]00)([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([3579][26]00)"
+ "([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)"
+ "|(^([1][89][0][48])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([2-9][0-9][0][48])([-\\/\\._]?)"
+ "(0?2)([-\\/\\._]?)(29)$)"
+ "|(^([1][89][2468][048])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([2-9][0-9][2468][048])([-\\/\\._]?)(0?2)"
+ "([-\\/\\._]?)(29)$)|(^([1][89][13579][26])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|"
+ "(^([2-9][0-9][13579][26])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$))";
/***
* 日期正则 支持: YYYY-MM-DD
*/
public static final String DATE_FORMAT1 = "(([0-9]{3}[1-9]|[0-9]{2}[1-9][0-9]{1}|[0-9]{1}[1-9][0-9]{2}|[1-9][0-9]{3})-(((0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01]))|((0[469]|11)-(0[1-9]|[12][0-9]|30))|(02-(0[1-9]|[1][0-9]|2[0-8]))))|((([0-9]{2})(0[48]|[2468][048]|[13579][26])|((0[48]|[2468][048]|[3579][26])00))-02-29)";
/**
* URL正则表达式 匹配 http www ftp
*/
public static final String URL = "^(http|www|ftp|)?(://)?(\\w+(-\\w+)*)(\\.(\\w+(-\\w+)*))*((:\\d+)?)(/(\\w+(-\\w+)*))*(\\.?(\\w)*)(\\?)?"
+ "(((\\w*%)*(\\w*\\?)*(\\w*:)*(\\w*\\+)*(\\w*\\.)*(\\w*&)*(\\w*-)*(\\w*=)*(\\w*%)*(\\w*\\?)*"
+ "(\\w*:)*(\\w*\\+)*(\\w*\\.)*"
+ "(\\w*&)*(\\w*-)*(\\w*=)*)*(\\w*)*)$";
/**
* 身份证正则表达式
*/
public static final String IDCARD = "((11|12|13|14|15|21|22|23|31|32|33|34|35|36|37|41|42|43|44|45|46|50|51|52|53|54|61|62|63|64|65)[0-9]{4})"
+ "(([1|2][0-9]{3}[0|1][0-9][0-3][0-9][0-9]{3}"
+ "[Xx0-9])|([0-9]{2}[0|1][0-9][0-3][0-9][0-9]{3}))";
/**
* 机构代码
*/
public static final String JIGOU_CODE = "^[A-Z0-9]{8}-[A-Z0-9]$";
/**
* 匹配数字组成的字符串 ^[0-9]+$
*/
public static final String STR_NUM = "^[0-9]+$";
// //------------------验证方法
/**
* 判断字段是否为空 符合返回ture
* @param str
* @return boolean
*/
public static synchronized boolean StrisNull(String str) {
return null == str || str.trim().length() <= 0 ? true : false;
}
/**
* 判断字段是非空 符合返回ture
* @param str
* @return boolean
*/
public static boolean StrNotNull(String str) {
return !StrisNull(str);
}
/**
* 字符串null转空
* @param str
* @return boolean
*/
public static String nulltoStr(String str) {
return StrisNull(str) ? "" : str;
}
/**
* 字符串null赋值默认值
* @param str 目标字符串
* @param defaut 默认值
* @return String
*/
public static String nulltoStr(String str, String defaut) {
return StrisNull(str) ? defaut : str;
}
/**
* 判断字段是否为Email 符合返回ture
* @param str
* @return boolean
*/
public static boolean isEmail(String str) {
return Regular(str, EMAIL);
}
/**
* 判断是否为电话号码 符合返回ture
* @param str
* @return boolean
*/
public static boolean isPhone(String str) {
return Regular(str, PHONE);
}
/**
* 判断是否为手机号码 符合返回ture
* @param str
* @return boolean
*/
public static boolean isMobile(String str) {
return RegularSJHM(str, MOBILE);
}
/**
* 判断是否为Url 符合返回ture
* @param str
* @return boolean
*/
public static boolean isUrl(String str) {
return Regular(str, URL);
}
/**
* 判断字段是否为数字 正负整数 正负浮点数 符合返回ture
* @param str
* @return boolean
*/
public static boolean isNumber(String str) {
return Regular(str, DOUBLE);
}
/**
* 判断字段是否为INTEGER 符合返回ture
* @param str
* @return boolean
*/
public static boolean isInteger(String str) {
return Regular(str, INTEGER);
}
/**
* 判断字段是否为正整数正则表达式 >=0 符合返回ture
* @param str
* @return boolean
*/
public static boolean isINTEGER_NEGATIVE(String str) {
return Regular(str, INTEGER_NEGATIVE);
}
/**
* 判断字段是否为负整数正则表达式 <=0 符合返回ture
* @param str
* @return boolean
*/
public static boolean isINTEGER_POSITIVE(String str) {
return Regular(str, INTEGER_POSITIVE);
}
/**
* 判断字段是否为DOUBLE 符合返回ture
* @param str
* @return boolean
*/
public static boolean isDouble(String str) {
return Regular(str, DOUBLE);
}
/**
* 判断字段是否为正浮点数正则表达式 >=0 符合返回ture
* @param str
* @return boolean
*/
public static boolean isDOUBLE_NEGATIVE(String str) {
return Regular(str, DOUBLE_NEGATIVE);
}
/**
* 判断字段是否为负浮点数正则表达式 <=0 符合返回ture
* @param str
* @return boolean
*/
public static boolean isDOUBLE_POSITIVE(String str) {
return Regular(str, DOUBLE_POSITIVE);
}
/**
* 判断字段是否为日期 符合返回ture
* @param str
* @return boolean
*/
public static boolean isDate(String str) {
return Regular(str, DATE_ALL);
}
/**
* 验证2010-12-10
* @param str
* @return
*/
public static boolean isDate1(String str) {
return Regular(str, DATE_FORMAT1);
}
/**
* 判断字段是否为年龄 符合返回ture
* @param str
* @return boolean
*/
public static boolean isAge(String str) {
return Regular(str, AGE);
}
/**
* 判断字段是否超长 字串为空返回fasle, 超过长度{leng}返回ture 反之返回false
* @param str
* @param leng
* @return boolean
*/
public static boolean isLengOut(String str, int leng) {
return StrisNull(str) ? false : str.trim().length() > leng;
}
/**
* 判断字段是否为身份证 符合返回ture
* @param str
* @return boolean
*/
public static boolean isIdCard(String str) {
if (StrisNull(str))
return false;
if (str.trim().length() == 15 || str.trim().length() == 18) {
return Regular(str, IDCARD);
} else {
return false;
}
}
/**
* 判断字段是否为邮编 符合返回ture
* @param str
* @return boolean
*/
public static boolean isCode(String str) {
return Regular(str, CODE);
}
/**
* 判断字符串是不是全部是英文字母
* @param str
* @return boolean
*/
public static boolean isEnglish(String str) {
return Regular(str, STR_ENG);
}
/**
* 判断字符串是不是全部是英文字母+数字
* @param str
* @return boolean
*/
public static boolean isENG_NUM(String str) {
return Regular(str, STR_ENG_NUM);
}
/**
* 判断字符串是不是全部是英文字母+数字+下划线
* @param str
* @return boolean
*/
public static boolean isENG_NUM_(String str) {
return Regular(str, STR_ENG_NUM_);
}
/**
* 过滤特殊字符串 返回过滤后的字符串
* @param str
* @return boolean
*/
public static String filterStr(String str) {
Pattern p = Pattern.compile(STR_SPECIAL);
Matcher m = p.matcher(str);
return m.replaceAll("").trim();
}
/**
* 校验机构代码格式
* @return
*/
public static boolean isJigouCode(String str) {
return Regular(str, JIGOU_CODE);
}
/**
* 判断字符串是不是数字组成
* @param str
* @return boolean
*/
public static boolean isSTR_NUM(String str) {
return Regular(str, STR_NUM);
}
/**
* 匹配是否符合正则表达式pattern 匹配返回true
* @param str 匹配的字符串
* @param pattern 匹配模式
* @return boolean
*/
private static boolean Regular(String str, String pattern) {
if (null == str || str.trim().length() <= 0)
return false;
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(str);
return m.matches();
}
/**
* 匹配是否符合正则表达式pattern 匹配返回true
* @param str 匹配的字符串
* @param pattern 匹配模式
* @return boolean
*/
private static boolean RegularSJHM(String str, String pattern) {
if (null == str || str.trim().length() <= 0){
return false;
}
if(str.contains("+86")){
str=str.replace("+86","");
}
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(str);
return m.matches();
}
/**
* description:匹配yyyyMMddHHmmss格式时间
* @param time
* @return boolean 2016-7-19 下午5:13:25 by
*/
public static final String yyyyMMddHHmmss = "[0-9]{14}";
public static boolean isyyyyMMddHHmmss(String time) {
if (time == null) {
return false;
}
boolean bool = time.matches(yyyyMMddHHmmss);
return bool;
}
/**
* description:匹配yyyyMMddHHmmss格式时间
* @param time
* @return boolean 2016-7-19 下午5:13:25 by
*/
public static final String isMac = "^[A-F0-9]{2}(-[A-F0-9]{2}){5}$";
public static boolean isMac(String mac) {
if (mac == null) {
return false;
}
boolean bool = mac.matches(isMac);
return bool;
}
/**
* description:匹配yyyyMMddHHmmss格式时间
* @param time
* @return boolean 2016-7-19 下午5:13:25 by
*/
public static final String longtime = "[0-9]{10}";
public static boolean isTimestamp(String timestamp) {
if (timestamp == null) {
return false;
}
boolean bool = timestamp.matches(longtime);
return bool;
}
/**
* 判断字段是否为datatype 符合返回ture
* @param str
* @return boolean
*/
public static final String DATATYPE = "^\\d{7}$";
public static boolean isDATATYPE(String str) {
return Regular(str, DATATYPE);
}
/**
* 判断字段是否为QQ 符合返回ture
* @param str
* @return boolean
*/
public static final String QQ = "^\\d{5,15}$";
public static boolean isQQ(String str) {
return Regular(str, QQ);
}
/**
* 判断字段是否为IMSI 符合返回ture
* @param str
* @return boolean
*/
//public static final String IMSI = "^4600[0,1,2,3,4,5,6,7,9]\\d{10}|(46011|46020)\\d{10}$";
public static final String IMSI = "^[1-9][0-9][0-9]0[0,1,2,3,4,5,6,7,9]\\d{10}|[1-9][0-9][0-9](11|20)\\d{10}$";
public static boolean isIMSI(String str) {
return Regular(str, IMSI);
}
/**
* 判断字段是否为IMEI 符合返回ture
* @param str
* @return boolean
*/
public static final String IMEI = "^\\d{8}$|^[a-fA-F0-9]{14}$|^\\d{15}$";
public static boolean isIMEI(String str) {return Regular(str, IMEI);}
/**
* 判断字段是否为CAPTURETIME 符合返回ture
* @param str
* @return boolean
*/
public static final String CAPTURETIME = "^\\d{10}|(20[0-9][0-9])\\d{10}$";
public static boolean isCAPTURETIME(String str) {return Regular(str, CAPTURETIME);}
/**
* description:检测认证类型
* @param auth
* @return boolean
*/
public static final String AUTH_TYPE = "^\\d{7}$";
public static boolean isAUTH_TYPE(String str) {return Regular(str, CAPTURETIME);}
/**
* description:检测FIRM_CODE
* @param auth
* @return boolean
*/
public static final String FIRM_CODE = "^\\d{9}$";
public static boolean isFIRM_CODE(String str) {return Regular(str, FIRM_CODE);}
/**
* description:检测经度
* @param auth
* @return boolean
*/
public static final String LONGITUDE = "^-?(([1-9]\\d?)|(1[0-7]\\d)|180)(\\.\\d{1,8})?$";
//public static final String LONGITUDE ="^([-]?(\\d|([1-9]\\d)|(1[0-7]\\d)|(180))(\\.\\d*)\\,[-]?(\\d|([1-8]\\d)|(90))(\\.\\d*))$";
public static boolean isLONGITUDE(String str) {return Regular(str, LONGITUDE);}
/**
* description:检测纬度
* @param auth
* @return boolean
*/
public static final String LATITUDE = "^-?(([1-8]\\d?)|([1-8]\\d)|90)(\\.\\d{1,8})?$";
public static boolean isLATITUDE(String str) {return Regular(str, LATITUDE);}
public static void main(String[] args) {
boolean bool = isLATITUDE("26.0615854");
System.out.println(bool);
}
}
11、thread/ThreadPoolManager.java—线程池管理器单例
package com.hsiehchou.common.thread;
import java.io.Serializable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
/**
* 线程池管理器单例
* 默认创建 ewCachedThreadPool :创建一个可缓存的线程池
* 可通过指定线程的数量来创建:newFixedThreadPool : 创建固定大小的线程池
*/
public class ThreadPoolManager implements Serializable {
private static final long serialVersionUID = 1465361469484903956L;
public static final ThreadPoolManager threadPoolManager = new ThreadPoolManager();
private static ThreadPoolManager tpm;
private transient ExecutorService newCachedThreadPool;
private transient ExecutorService newFixedThreadPool;
private int poolCapacity;
private ThreadPoolManager(){
if( newCachedThreadPool == null )
newCachedThreadPool = Executors.newCachedThreadPool();
}
@Deprecated
public static ThreadPoolManager getInstance(){
if( tpm == null ){
synchronized(ThreadPoolManager.class){
if( tpm == null )
tpm = new ThreadPoolManager();
}
}
return tpm;
}
/**
* 返回 newCachedThreadPool
*/
public ExecutorService getExecutorService(){
if( newCachedThreadPool == null ){
synchronized(ThreadPoolManager.class){
if( newCachedThreadPool == null )
newCachedThreadPool = Executors.newCachedThreadPool();
}
}
return newCachedThreadPool;
}
/**
* 返回 newFixedThreadPool
*/
public ExecutorService getExecutorService(int poolCapacity){
return getExecutorService(poolCapacity, false);
}
/**
* 返回 newFixedThreadPool
*/
public synchronized ExecutorService getExecutorService(int poolCapacity, boolean closeOld){
if(newFixedThreadPool == null || (this.poolCapacity != poolCapacity)){
if(newFixedThreadPool != null && closeOld){
newFixedThreadPool.shutdown();
}
newFixedThreadPool = Executors.newFixedThreadPool(poolCapacity);
this.poolCapacity = poolCapacity;
}
return newFixedThreadPool;
}
}
12、time/TimeTranstationUtils.java—时间转换工具类
package com.hsiehchou.common.time;
import org.apache.commons.lang.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
/**
* Description: 时间转换工具类
*/
public class TimeTranstationUtils {
private static final Logger logger = LoggerFactory.getLogger(TimeTranstationUtils.class);
/* private static SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
private static SimpleDateFormat sdFormatternew = new SimpleDateFormat("yyyyMMddHH");
private static SimpleDateFormat sdFormatter1 = new SimpleDateFormat("yyyy-MM-dd");
private static SimpleDateFormat sdFormatter2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
private static SimpleDateFormat sdFormatter3 = new SimpleDateFormat("yyyyMMdd");*/
private static Date nowTime;
public static String Date2yyyyMMddHHmmss() {
SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
nowTime = new Date(System.currentTimeMillis());
String time = sdFormatter.format(nowTime);
return time;
}
public static String Date2yyyyMMddHHmmss(long timestamp) {
SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
nowTime = new Date(timestamp);
String time = sdFormatter.format(nowTime);
return time;
}
public static String Date2yyyyMMdd(long timestamp) {
SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMdd");
nowTime = new Date(timestamp);
String time = sdFormatter.format(nowTime);
return time;
}
public static String Date2yyyyMMddHH(String str) {
SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
SimpleDateFormat sdFormatternew = new SimpleDateFormat("yyyyMMddHH");
try {
nowTime = sdFormatter.parse(str);
} catch (ParseException e) {
e.printStackTrace();
}
String time = sdFormatternew.format(nowTime);
return time;
}
public static String Date2yyyy_MM_dd() {
SimpleDateFormat sdFormatter1 = new SimpleDateFormat("yyyy-MM-dd");
nowTime = new Date(System.currentTimeMillis());
String time = sdFormatter1.format(nowTime);
return time;
}
public static String Date2yyyy_MM_dd_HH_mm_ss() {
SimpleDateFormat sdFormatter2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
nowTime = new Date(System.currentTimeMillis());
String time = sdFormatter2.format(nowTime);
return time;
}
public static String Date2yyyyMMdd() {
SimpleDateFormat sdFormatter3 = new SimpleDateFormat("yyyyMMdd");
nowTime = new Date(System.currentTimeMillis());
String time = sdFormatter3.format(nowTime);
return time;
}
public static String Date2yyyyMMdd(String str) {
SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
SimpleDateFormat sdFormatter3 = new SimpleDateFormat("yyyyMMdd");
try {
nowTime = sdFormatter.parse(str);
} catch (ParseException e) {
e.printStackTrace();
}
String time = sdFormatter3.format(nowTime);
return time;
}
public static Long Date2yyyyMMddHHmmssToLong() {
return System.currentTimeMillis() / 1000;
}
public static String long2date(String capturetime){
SimpleDateFormat sdf= new SimpleDateFormat("yyyyMMdd");
//前面的lSysTime是秒数,先乘1000得到毫秒数,再转为java.util.Date类型
Date dt = new Date(Long.valueOf(capturetime) * 1000);
String sDateTime = sdf.format(dt); //得到精确到秒的表示:08/31/2006 21:08:00
return sDateTime;
}
public static Long yyyyMMddHHmmssToLong(String time) {
SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
if (StringUtils.isBlank(time)) {
return 0L;
} else {
boolean isNum = time.matches("[0-9]+");
if (isNum) {
long long1 = 0;
try {
long1 = sdFormatter.parse(time).getTime();
} catch (ParseException e) {
logger.error(time + "时间转换为long错误" + isNum);
return 0L;
}
return long1 / 1000;
}
}
return 0L;
}
public static Date yyyyMMddHHmmssToDate(String time) {
SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
if (StringUtils.isBlank(time)) {
return new Date();
} else {
boolean isNum = time.matches("[0-9]+");
if (isNum) {
Date date = null;
try {
date = sdFormatter.parse(time);
} catch (ParseException e) {
logger.error(time + "时间转换为date错误" + isNum, e);
System.out.println(time);
System.out.println(isNum);
e.printStackTrace();
}
return date;
}
}
return new Date();
}
public static Date yyyyMMddHHmmssToDate() {
Date date = null;
SimpleDateFormat sdFormatter2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
try {
date = sdFormatter2.parse(Date2yyyy_MM_dd_HH_mm_ss());
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return date;
}
public static java.sql.Date strToDate(String strDate) {
String str = strDate;
SimpleDateFormat format = new SimpleDateFormat("yyyy-mm-dd");
Date d = null;
try {
d = format.parse(str);
} catch (Exception e) {
e.printStackTrace();
}
java.sql.Date date = new java.sql.Date(d.getTime());
return date;
}
public static Long str2Long(String str){
if(!StringUtils.isBlank(str)){
return Long.valueOf(str);
}else{
return 0L;
}
}
public static Double str2Double(String str){
if(!StringUtils.isBlank(str)){
return Double.valueOf(str);
}else{
return 0.0;
}
}
public static HashMap<String,Object> mapString2Long(Map<String,String> map, String key, HashMap<String,Object> objectMap) {
String logouttime = map.get(key);
if (!StringUtils.isBlank(logouttime)) {
objectMap.put(key, Long.valueOf(logouttime));
} else {
objectMap.put(key, 0L);
}
return objectMap;
}
public static void main(String[] args) throws InterruptedException {
System.out.println(long2date("1463487992"));
}
}
四、Resources开发
xz_bigdata_resources结构
注意:这里的resources要选中右键,选择Make Directory as,选择下级的Resources Root,变成Resources配置源文件,项目可以任意调用。
1、resources下面
log4j2.properties
log4j.rootLogger = error,stdout,D,E
log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target = System.out
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern = [%-5p] %d{yyyy-MM-dd HH:mm:ss,SSS} method:%l%n%m%n
log4j.appender.D = org.apache.log4j.DailyRollingFileAppender
log4j.appender.D.File = F://logs/log.log
log4j.appender.D.Append = true
log4j.appender.D.Threshold = DEBUG
log4j.appender.D.layout = org.apache.log4j.PatternLayout
log4j.appender.D.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss} [ %t:%r ] - [ %p ] %m%n
log4j.appender.E = org.apache.log4j.DailyRollingFileAppender
log4j.appender.E.File =F://logs/error.log
log4j.appender.E.Append = true
log4j.appender.E.Threshold = ERROR
log4j.appender.E.layout = org.apache.log4j.PatternLayout
log4j.appender.E.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss} [ %t:%r ] - [ %p ] %m%n
2、common
datatype.properties
# base = datatype,idcard,name,age,collecttime,imei
# wechat = datatype,wechat,phone,collecttime,imei
wechat = imei,imsi,longitude,latitude,phone_mac,device_mac,device_number,collect_time,username,phone,object_username,send_message,accept_message,message_time
mail = imei,imsi,longitude,latitude,phone_mac,device_mac,device_number,collect_time,send_mail,send_time,accept_mail,accept_time,mail_content,mail_type
qq = imei,imsi,longitude,latitude,phone_mac,device_mac,device_number,collect_time,username,phone,object_username,send_message,accept_message,message_time
mysql.properties
db_ip = 192.168.116.201
db_port = 3306
user = root
password = root
3、es
es_cluster.properties
es.cluster.name=xz_es
es.cluster.nodes = hadoop1,hadoop2,hadoop3
es.cluster.nodes1 = hadoop1
es.cluster.nodes2 = hadoop2
es.cluster.nodes3 = hadoop3
es.cluster.tcp.port = 9300
es.cluster.http.port = 9200
mapping/base.json
{
"_source": {
"enabled": true
},
"properties": {
"datatype":{"type": "keyword"},
"idcard":{"type": "keyword"},
"name":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"age":{"type": "long"},
"collecttime":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"imei":{"type": "keyword"}
}
}
mapping/fieldmapping.properties
tables = wechat,mail,qq
wechat.imei = string
wechat.imsi = string
wechat.longitude = double
wechat.latitude = double
wechat.phone_mac = string
wechat.device_mac = string
wechat.device_number = string
wechat.collect_time = long
wechat.username = string
wechat.phone = string
wechat.object_username = string
wechat.send_message = string
wechat.accept_message = string
wechat.message_time = long
wechat.id = string
wechat.table = string
wechat.filename = string
wechat.absolute_filename = string
mail.imei = string
mail.imsi = string
mail.longitude = double
mail.latitude = double
mail.phone_mac = string
mail.device_mac = string
mail.device_number = string
mail.collect_time = long
mail.send_mail = string
mail.send_time = long
mail.accept_mail = string
mail.accept_time = long
mail.mail_content = string
mail.mail_type = string
mail.id = string
mail.table = string
mail.filename = string
mail.absolute_filename = string
qq.imei = string
qq.imsi = string
qq.longitude = double
qq.latitude = double
qq.phone_mac = string
qq.device_mac = string
qq.device_number = string
qq.collect_time = long
qq.username = string
qq.phone = string
qq.object_username = string
qq.send_message = string
qq.accept_message = string
qq.message_time = long
qq.id = string
qq.table = string
qq.filename = string
qq.absolute_filename = string
mapping/mail.json
{
"_source": {
"enabled": true
},
"properties": {
"imei":{"type": "keyword"},
"imsi":{"type": "keyword"},
"longitude":{"type": "double"},
"latitude":{"type": "double"},
"phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"collect_time":{"type": "long"},
"send_mail":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"send_time":{"type": "long"},
"accept_mail":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"accept_time":{"type": "long"},
"mail_content":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"mail_type":{"type": "keyword"},
"id":{"type": "keyword"},
"table":{"type": "keyword"},
"filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
}
}
mapping/qq.json
{
"_source": {
"enabled": true
},
"properties": {
"imei":{"type": "keyword"},
"imsi":{"type": "keyword"},
"longitude":{"type": "double"},
"latitude":{"type": "double"},
"phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"collect_time":{"type": "long"},
"username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"phone":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"object_username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"send_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"accept_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"message_time":{"type": "long"},
"id":{"type": "keyword"},
"table":{"type": "keyword"},
"filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
}
}
mapping/test.json
{
"_source": {
"enabled": true
},
"properties": {
"id":{"type": "keyword"},
"source":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"target":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"library_id":{"type": "long"},
"source_sign":{"type": "keyword"},
"target_sign":{"type": "keyword"},
"create_time":{"type": "long"},
"create_user_id":{"type": "keyword"},
"is_audit":{"type": "long"},
"is_del":{"type": "long"},
"last_modify_user_id":{"type": "keyword"},
"last_modify_time":{"type": "long"},
"init_version":{"type": "long"},
"version":{"type": "long"},
"score":{"type": "keyword"},
"level":{"type": "keyword"},
"example":{"type": "keyword"},
"conflict":{"type": "keyword"},
"srcLangId":{"type": "long"},
"srcLangCN":{"type": "keyword"},
"tarLangId":{"type": "long"},
"tarLangCN":{"type": "keyword"},
"docId":{"type": "keyword"},
"source_simhash":{"type": "keyword"},
"sentence_id":{"type": "long"},
"section_id":{"type": "long"},
"type":{"type": "long"},
"industry":{"type": "long"},
"industry_name":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"querycount":{"type": "long"},
"reviser":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"comment":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
}
}
mapping/wechat.json
{
"_source": {
"enabled": true
},
"properties": {
"imei":{"type": "keyword"},
"imsi":{"type": "keyword"},
"longitude":{"type": "double"},
"latitude":{"type": "double"},
"phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"collect_time":{"type": "long"},
"username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"phone":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"object_username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"send_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"accept_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"message_time":{"type": "long"},
"id":{"type": "keyword"},
"table":{"type": "keyword"},
"filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
}
}
4、flume
datatype.properties
flume-config.properties
#kafka topic
kafkatopic=test100
validation.properties
# 文件名验证开关
FILENAME_VALIDATION=1
# DATATYPE转换开关
DATATYPE_TRANSACTION=1
# 经纬度验证开关
LONGLAIT_VALIDATION=1
# 是否入错误数据到ES
ERROR_ES=1
5、hadoop
core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera Manager-->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:8020</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>1</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>simple</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>false</value>
</property>
<property>
<name>hadoop.rpc.protection</name>
<value>authentication</value>
</property>
<property>
<name>hadoop.security.auth_to_local</name>
<value>DEFAULT</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.flume.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.flume.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.HTTP.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.HTTP.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.security.group.mapping</name>
<value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value>
</property>
<property>
<name>hadoop.security.instrumentation.requires.admin</name>
<value>false</value>
</property>
<property>
<name>net.topology.script.file.name</name>
<value>/etc/hadoop/conf.cloudera.yarn/topology.py</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>65536</value>
</property>
<property>
<name>hadoop.ssl.enabled</name>
<value>false</value>
</property>
<property>
<name>hadoop.ssl.require.client.cert</name>
<value>false</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.keystores.factory.class</name>
<value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.server.conf</name>
<value>ssl-server.xml</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.client.conf</name>
<value>ssl-client.xml</value>
<final>true</final>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera Manager-->
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///dfs/nn</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address</name>
<value>hadoop1:8022</value>
</property>
<property>
<name>dfs.https.address</name>
<value>hadoop1:50470</value>
</property>
<property>
<name>dfs.https.port</name>
<value>50470</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop1:50070</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>fs.permissions.umask-mode</name>
<value>022</value>
</property>
<property>
<name>dfs.namenode.acls.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.client.use.legacy.blockreader</name>
<value>false</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>false</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/run/hdfs-sockets/dn</value>
</property>
<property>
<name>dfs.client.read.shortcircuit.skip.checksum</name>
<value>false</value>
</property>
<property>
<name>dfs.client.domain.socket.data.traffic</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property>
</configuration>
6、hbase
core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera Manager-->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:8020</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>1</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>simple</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>false</value>
</property>
<property>
<name>hadoop.rpc.protection</name>
<value>authentication</value>
</property>
<property>
<name>hadoop.security.auth_to_local</name>
<value>DEFAULT</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.flume.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.flume.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.HTTP.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.HTTP.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.security.group.mapping</name>
<value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value>
</property>
<property>
<name>hadoop.security.instrumentation.requires.admin</name>
<value>false</value>
</property>
<property>
<name>hadoop.ssl.require.client.cert</name>
<value>false</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.keystores.factory.class</name>
<value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.server.conf</name>
<value>ssl-server.xml</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.client.conf</name>
<value>ssl-client.xml</value>
<final>true</final>
</property>
</configuration>
hbase-server-config.properties
#hbase 开发环境
need.init.hbase=true
# hbase.zookeeper.quorum=hadoop1.ultiwill.com,hadoop2.ultiwill.com,hadoop3.ultiwill.com
hbase.zookeeper.quorum=hadoop1,hadoop2,hadoop3
hbase.zookeeper.property.clientPort=2181
hbase.rpc.timeout=120000
hbase.client.scanner.timeout.period=120000
hbase-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera Manager-->
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop1:8020/hbase</value>
</property>
<property>
<name>hbase.replication</name>
<value>true</value>
</property>
<property>
<name>hbase.client.write.buffer</name>
<value>2097152</value>
</property>
<property>
<name>hbase.client.pause</name>
<value>100</value>
</property>
<property>
<name>hbase.client.retries.number</name>
<value>35</value>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>100</value>
</property>
<property>
<name>hbase.client.keyvalue.maxsize</name>
<value>10485760</value>
</property>
<property>
<name>hbase.ipc.client.allowsInterrupt</name>
<value>true</value>
</property>
<property>
<name>hbase.client.primaryCallTimeout.get</name>
<value>10</value>
</property>
<property>
<name>hbase.client.primaryCallTimeout.multiget</name>
<value>10</value>
</property>
<property>
<name>hbase.fs.tmp.dir</name>
<value>/user/${user.name}/hbase-staging</value>
</property>
<property>
<name>hbase.client.scanner.timeout.period</name>
<value>60000</value>
</property>
<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint</value>
</property>
<property>
<name>hbase.regionserver.thrift.http</name>
<value>false</value>
</property>
<property>
<name>hbase.thrift.support.proxyuser</name>
<value>false</value>
</property>
<property>
<name>hbase.rpc.timeout</name>
<value>60000</value>
</property>
<property>
<name>hbase.snapshot.enabled</name>
<value>true</value>
</property>
<property>
<name>hbase.snapshot.master.timeoutMillis</name>
<value>60000</value>
</property>
<property>
<name>hbase.snapshot.region.timeout</name>
<value>60000</value>
</property>
<property>
<name>hbase.snapshot.master.timeout.millis</name>
<value>60000</value>
</property>
<property>
<name>hbase.security.authentication</name>
<value>simple</value>
</property>
<property>
<name>hbase.rpc.protection</name>
<value>authentication</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>60000</value>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase</value>
</property>
<property>
<name>zookeeper.znode.rootserver</name>
<value>root-region-server</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop1,hadoop3,hadoop2</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.rest.ssl.enabled</name>
<value>false</value>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera Manager-->
<configuration>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///dfs/nn</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address</name>
<value>hadoop1:8022</value>
</property>
<property>
<name>dfs.https.address</name>
<value>hadoop1:50470</value>
</property>
<property>
<name>dfs.https.port</name>
<value>50470</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop1:50070</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>fs.permissions.umask-mode</name>
<value>022</value>
</property>
<property>
<name>dfs.namenode.acls.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.client.use.legacy.blockreader</name>
<value>false</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>false</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/run/hdfs-sockets/dn</value>
</property>
<property>
<name>dfs.client.read.shortcircuit.skip.checksum</name>
<value>false</value>
</property>
<property>
<name>dfs.client.domain.socket.data.traffic</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property>
</configuration>
7、kafka
kafka-data-push-info
--config kafka自动推送数据配置目录
--timeOut 推送超时时间 默认 15 min 单位为分钟
kafka自动推送数据配置:
data.sources 数据源列表。 (例如:data.sources =bhdb1,dpxx)
{source}.source.type 某个数据源的类型。 (数据源分为数据库和文件两大类, 若为数据库 则使用 数据的名称 例如 oracle,mysql,sqlserver等, 否则使用 file)
例如:bhdb1.source.type=oracle 或者 dpxx.source.type=file
数据源为数据库:
{source}.db.name 数据库的名称
{source}.db.host 数据库的ip或者主机名
{source}.db.port 数据库的访问端口, 若不填写则使用该种数据库的默认端口
{source}.db.user 用户名
{source}.db.pwd 密码
{source}.push.topic 推送到topic的全局配置,即该数据库下配置的表没有配置topic的时候,其数据会推送到该topic。
{source}.push.tables 需要推送数的表列表
{source}.{table}.push.sql 只推送使用该sql查询到的数据 。 不填则表示推送全部。
{source}.{table}.push.adjusterfactory 对推送的数据进行调整 , 必须为com.bh.d406.bigdata.kafka.producer.DataAdjuster的子类 , 需要进行调整数据的时候填写
{source}.{table}.push.topic 该表的数据推送到topic名称 , 若不填则使用全局的topic配置
数据源为文件:
{source}.file.dir 文件目录 (注意:只支持本地目录 )
{source}.file.encoding 文件编码 (默认UTF-8)
{source}.file.extensions 需要过滤的文件格式列表
{source}.file.data.loaderfactory 文件加载器工厂类
{source}.file.data.fields 记录的字段列表 与顺序有关
{source}.file.data.spliter 数据的分割符 默认 \t
{source}.file.skip.firstline 是否跳过第一行数据 false or true
{source}.file.data.adjusterfactory 数据矫正工厂类
{source}.push.thread.num 读取文件的线程数
{source}.push.batch.size 分批推送数据 , 每批数据大小
{source}.push.topic 数据推送的目标topic名称
{source}.store.table 存储的表名
kafka-server-config.properties
#################Kafka 全局配置 #######################
# 格式为host1:port1,host2:port2,
# 这是一个broker列表,用于获得元数据(topics,partitions和replicas),建立起来的socket连接用于发送实际数据,
# 这个列表可以是broker的一个子集,或者一个VIP,指向broker的一个子集
# metadata.broker.list=hadoop1:9092,slaver01:9092,slaver02:9092
metadata.broker.list=hadoop1:9092
# zookeeper列表
zk.connect=hadoop1:2181,hadoop2:2181,hadoop3:2181
# 字消息的序列化类,默认是的encoder处理一个byte[],返回一个byte[]
# 默认值为 kafka.serializer.DefaultEncoder
serializer.class=kafka.serializer.StringEncoder
# 用来控制一个produce请求怎样才能算完成,准确的说,是有多少broker必须已经提交数据到log文件,并向leader发送ack,可以设置如下的值:
# 0,意味着producer永远不会等待一个来自broker的ack,这就是0.7版本的行为。这个选项提供了最低的延迟,但是持久化的保证是最弱的,当server挂掉的时候会丢失一些数据。
# 1,意味着在leader replica已经接收到数据后,producer会得到一个ack。这个选项提供了更好的持久性,因为在server确认请求成功处理后,client才会返回。如果刚写到leader上,还没来得及复制leader就挂了,那么消息才可能会丢失。
# -1,意味着在所有的ISR都接收到数据后,producer才得到一个ack。这个选项提供了最好的持久性,只要还有一个replica存活,那么数据就不会丢失。
# 默认值 为 0
request.required.acks=1
# 请求超时时间 默认为 10000
request.timeout.ms=60000
#决定消息是否应在一个后台线程异步发送。
#合法的值为sync,表示异步发送;sync表示同步发送。
#设置为async则允许批量发送请求,这回带来更高的吞吐量,但是client的机器挂了的话会丢失还没有发送的数据。
#默认值为 sync
producer.type=sync
8、redis
redis.properties
redis.hostname = 192.168.116.202
redis.port = 6379
9、spark
hive_fields_mapping.properties
datatype= base,wechat
#base = datatype,idcard,name,age,collecttime,imei
#wechat = datatype,wechat,phone,collecttime,imei
#============================================================base
base.datatype = string
base.idcard = string
base.name = string
base.age = long
base.collecttime = string
base.imei = string
#============================================================wechat
wechat.datatype = string
wechat.wechat = string
wechat.phone = string
wechat.collecttime = string
wechat.imei = string
relation.properties
#需要关联的字段
relationfield = phone_mac,phone,username,send_mail,imei,imsi
complex_relationfield = card,phone_mac,phone,username,send_mail,imei,imsi
spark-batch-config.properties
# spark 常规 配置 不包括 流式处理的 配置
#################### 全局 #############################
# 在用户没有指定时,用于分布式随机操作(groupByKey,reduceByKey等等)的默认的任务数( shuffle过程中 task的个数 )
# 默认为 8
spark.default.parallelism=16
# Spark用于缓存的内存大小所占用的Java堆的比率。这个不应该大于JVM中老年代所分配的内存大小
# 默认情况下老年代大小是堆大小的2/3,但是你可以通过配置你的老年代的大小,然后再去增加这个比率
# 默认为 0.66
# spark 1.6 后 过期
# spark.storage.memoryFraction=0.66
# 在spark1.6.0版本默认大小为: (“Java Heap” – 300MB) * 0.75
# 例如:如果堆内存大小有4G,将有2847MB的Spark Memory,Spark Memory=(4*1024MB-300)*0.75=2847MB
# 这部分内存会被分成两部分:Storage Memory和Execution Memory
# 而且这两部分的边界由spark.memory.storageFraction参数设定,默认是0.5即50%
# 新的内存管理模型中的优点是,这个边界不是固定的,在内存压力下这个边界是可以移动的
# 如一个区域内存不够用时可以从另一区域借用内存
spark.memory.fraction=0.75
spark.memory.storageFraction=0.5
# 是否要压缩序列化的RDD分区(比如,StorageLevel.MEMORY_ONLY_SER)
# 在消耗一点额外的CPU时间的代价下,可以极大的提高减少空间的使用
# 默认为 false
spark.rdd.compress=true
# The codec used to compress internal data such as RDD partitions,
# broadcast variables and shuffle outputs. By default,
# Spark provides three codecs: lz4, lzf, and snappy. You can also use fully qualified class names to specify the codec,
# e.g.
# 1. org.apache.spark.io.LZ4CompressionCodec,
# 2. org.apache.spark.io.LZFCompressionCodec,
# 3. org.apache.spark.io.SnappyCompressionCodec. default
spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
# Block size (in bytes) used in Snappy compression,
# in the case when Snappy compression codec is used.
# Lowering this block size will also lower shuffle memory usage when Snappy is used.
# default : 32K
spark.io.compression.snappy.blockSize=32768
# 同时获取每一个分解任务的时候,映射输出文件的最大的尺寸(以兆为单位)。
# 由于对每个输出都需要我们去创建一个缓冲区去接受它,这个属性值代表了对每个分解任务所使用的内存的一个上限值,
# 因此除非你机器内存很大,最好还是配置一下这个值。
# 默认48
spark.reducer.maxSizeInFlight=48
# 这个配置参数仅适用于HashShuffleMananger的实现,同样是为了解决生成过多文件的问题,
# 采用的方式是在不同批次运行的Map任务之间重用Shuffle输出文件,也就是说合并的是不同批次的Map任务的输出数据,
# 但是每个Map任务所需要的文件还是取决于Reduce分区的数量,因此,它并不减少同时打开的输出文件的数量,
# 因此对内存使用量的减少并没有帮助。只是HashShuffleManager里的一个折中的解决方案。
# 默认为false
#spark.shuffle.consolidateFiles=false
#java.io.Externalizable. Java serialization is flexible but often quite slow, and leads to large serialized formats for many classes.
#default java.io.Serializable
#spark.serializer=org.apache.spark.serializer.KryoSerializer
# Speculation是在任务调度的时候,如果没有适合当前本地性要求的任务可供运行,
# 将跑得慢的任务在空闲计算资源上再度调度的行为,这些参数调整这些行为的频率和判断指标,默认是不使用Speculation的
# 默认为false
# 慎用 可能导致数据重复的现象
#spark.speculation=true
# task失败重试次数
# 默认为4
spark.task.maxFailures=8
# Spark 是有任务的黑名单机制的,但是这个配置在官方文档里面并没有写,可以设置下面的参数,
# 比如设置成一分钟之内不要再把任务发到这个 Executor 上了,单位是毫秒。
# spark.scheduler.executorTaskBlacklistTime=60000
# 超过这个时间,可以执行 NODE_LOCAL 的任务
# 默认为 3000
spark.locality.wait.process=1
# 超过这个时间,可以执行 RACK_LOCAL 的任务
# 默认为 3000
spark.locality.wait.node=3
# 超过这个时间,可以执行 ANY 的任务
# 默认为 3000
spark.locality.wait.rack=1000
#################### yarn ###########################
# 提交的jar文件 的副本数
# 默认为 3
spark.yarn.submit.file.replication=1
# container中的线程数
# 默认为 25
spark.yarn.containerLauncherMaxThreads=25
# 解决yarn-cluster模式下 对处理 permGen space oom异常很有用
# spark.yarn.am.extraJavaOptions=
# spark.driver.extraJavaOptions=-XX:PermSize=512M -XX:MaxPermSize=1024M
# 对象指针压缩 和 gc日志收集打印
# spark.executor.extraJavaOptions=-XX:PermSize=512M -XX:MaxPermSize=1024M -XX:MaxDirectMemorySize=1536M -XX:+UseCompressedOops -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
# -XX:-UseGCOverheadLimit
# GC默认情况下有一个限制,默认是GC时间不能超过2%的CPU时间,但是如果大量对象创建(在Spark里很容易出现,代码模式就是一个RDD转下一个RDD),
# 就会导致大量的GC时间,从而出现OutOfMemoryError: GC overhead limit exceeded,可以通过设置-XX:-UseGCOverheadLimit关掉它。
# -XX:+UseCompressedOops 可以压缩指针(8字节变成4字节)
spark.executor.extraJavaOptions=-XX:PermSize=512M -XX:MaxPermSize=1024m -XX:+CMSClassUnloadingEnabled -Xmn512m -XX:MaxTenuringThreshold=15 -XX:-UseGCOverheadLimit -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCompressedOops -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log -XX:+HeapDumpOnOutOfMemoryError
# 当shuffle缓存的数据超过此值 强制刷磁盘 单位为 byte
# spark.shuffle.spill.initialMemoryThreshold=671088640
################### AKKA 相关 ##########################
# 在控制面板通信(序列化任务和任务结果)的时候消息尺寸的最大值,单位是MB。
# 如果你需要给驱动器发回大尺寸的结果(比如使用在一个大的数据集上面使用collect()方法),那么你就该增加这个值了。
# 默认为 10
spark.akka.frameSize=1024
# 用于通信的actor线程数量。如果驱动器有很多CPU核心,那么在大集群上可以增大这个值。
# 默认为 4
spark.akka.threads=8
# Spark节点之间通信的超时时间,以秒为单位
# 默认为20s
spark.akka.timeout=120
# exector的堆外内存(不会占用 分配给executor的jvm内存)
# spark.yarn.executor.memoryOverhead=2560
spark-start-config.properties
# Spark 任务 使用java -cp 方式启动的参数配置
#
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/lib/native
spark.yarn.jar=local:/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/spark/lib/spark-assembly.jar
spark.authenticate=false
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/lib/native
spark.yarn.historyServer.address=http://BH-LAN-Virtual-hadoop-9:18088
spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/lib/native
spark.eventLog.enabled=true
spark.dynamicAllocation.schedulerBacklogTimeout=1
SPARK_SUBMIT=true
spark.yarn.config.gatewayPath=/opt/cloudera/parcels
spark.ui.killEnabled=true
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled=true
spark.dynamicAllocation.minExecutors=0
spark.dynamicAllocation.executorIdleTimeout=60
spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..
spark.shuffle.service.port=7337
spark.eventLog.dir=hdfs://nameservice1/user/spark/applicationHistory
spark.dynamicAllocation.enabled=true
#/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/spark/lib/*
#/etc/spark/conf.cloudera.spark_on_yarn/
#/etc/hadoop/conf.cloudera.yarn/
spark.submit.deployMode=client
spark.app.name=default
spark.master=yarn-client
spark.driver.memory=1g
spark.executor.instances=1
spark.executor.memory=4g
spark.executor.cores=2
spark.jars=
spark-streaming-config.properties
# spark 流式处理的 配置
# job的并行度
# 默认为 1
spark.streaming.concurrentJobs=1
# Spark记忆任何元数据(stages生成,任务生成等等)的时间(秒)。周期性清除保证在这个时间之前的元数据会被遗忘。
#当长时间几小时,几天的运行Spark的时候设置这个是很有用的。注意:任何内存中的RDD只要过了这个时间就会被清除掉。
# 默认 disable
spark.cleaner.ttl=3600
# 将不再使用的缓存数据清除
# 默认为false
spark.streaming.unpersist=true
# 从网络中批量接受对象时的持续时间 , 单位 ms。
# 默认为200ms
spark.streaming.blockInterval=200
# 控制Receiver速度 单位 s
# 因为当streaming程序的数据源的数据量突然变大巨大,可能会导致streaming被撑住导致吞吐不过来,所以可以考虑对于最大吞吐做一下限制。
# 默认为 100000
spark.streaming.receiver.maxRate=10000
# kafka每个分区最大的读取速度 单位 s
# 控制kafka读取的量
spark.streaming.kafka.maxRatePerPartition=50
# 读取kafka的分区最新offset的最大尝试次数
# 默认为1
spark.streaming.kafka.maxRetries=5
# 1、为什么引入Backpressure
# 默认情况下,Spark Streaming通过Receiver以生产者生产数据的速率接收数据,计算过程中会出现batch processing time > batch interval的情况,
# 其中batch processing time 为实际计算一个批次花费时间, batch interval为Streaming应用设置的批处理间隔。
# 这意味着Spark Streaming的数据接收速率高于Spark从队列中移除数据的速率,也就是数据处理能力低,在设置间隔内不能完全处理当前接收速率接收的数据。
# 如果这种情况持续过长的时间,会造成数据在内存中堆积,导致Receiver所在Executor内存溢出等问题(如果设置StorageLevel包含disk, 则内存存放不下的数据会溢写至disk, 加大延迟)。
# Spark 1.5以前版本,用户如果要限制Receiver的数据接收速率,可以通过设置静态配制参数“spark.streaming.receiver.maxRate”的值来实现,
# 此举虽然可以通过限制接收速率,来适配当前的处理能力,防止内存溢出,但也会引入其它问题。比如:producer数据生产高于maxRate,当前集群处理能力也高于maxRate,这就会造成资源利用率下降等问题。
# 为了更好的协调数据接收速率与资源处理能力,Spark Streaming 从v1.5开始引入反压机制(back-pressure),通过动态控制数据接收速率来适配集群数据处理能力。
# 2、Backpressure
# Spark Streaming Backpressure: 根据JobScheduler反馈作业的执行信息来动态调整Receiver数据接收率。
# 通过属性“spark.streaming.backpressure.enabled”来控制是否启用backpressure机制,默认值false,即不启用
spark.streaming.backpressure.enabled=true
spark.streaming.backpressure.initialRate=200
datatype/fieldtype.properties
hive/hive-server-config.properties
# hbase 开发环境
hive/hive-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera Manager-->
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop1:9083</value>
</property>
<property>
<name>hive.metastore.client.socket.timeout</name>
<value>300</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.warehouse.subdir.inherit.perms</name>
<value>true</value>
</property>
<property>
<name>hive.auto.convert.join</name>
<value>true</value>
</property>
<property>
<name>hive.auto.convert.join.noconditionaltask.size</name>
<value>20971520</value>
</property>
<property>
<name>hive.optimize.bucketmapjoin.sortedmerge</name>
<value>false</value>
</property>
<property>
<name>hive.smbjoin.cache.rows</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.logging.operation.enabled</name>
<value>true</value>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/hadoop_log/log/hive/operation_logs</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>-1</value>
</property>
<property>
<name>hive.exec.reducers.bytes.per.reducer</name>
<value>67108864</value>
</property>
<property>
<name>hive.exec.copyfile.maxsize</name>
<value>33554432</value>
</property>
<property>
<name>hive.exec.reducers.max</name>
<value>1099</value>
</property>
<property>
<name>hive.vectorized.groupby.checkinterval</name>
<value>4096</value>
</property>
<property>
<name>hive.vectorized.groupby.flush.percent</name>
<value>0.1</value>
</property>
<property>
<name>hive.compute.query.using.stats</name>
<value>false</value>
</property>
<property>
<name>hive.vectorized.execution.enabled</name>
<value>true</value>
</property>
<property>
<name>hive.vectorized.execution.reduce.enabled</name>
<value>false</value>
</property>
<property>
<name>hive.merge.mapfiles</name>
<value>true</value>
</property>
<property>
<name>hive.merge.mapredfiles</name>
<value>false</value>
</property>
<property>
<name>hive.cbo.enable</name>
<value>false</value>
</property>
<property>
<name>hive.fetch.task.conversion</name>
<value>minimal</value>
</property>
<property>
<name>hive.fetch.task.conversion.threshold</name>
<value>268435456</value>
</property>
<property>
<name>hive.limit.pushdown.memory.usage</name>
<value>0.1</value>
</property>
<property>
<name>hive.merge.sparkfiles</name>
<value>true</value>
</property>
<property>
<name>hive.merge.smallfiles.avgsize</name>
<value>16777216</value>
</property>
<property>
<name>hive.merge.size.per.task</name>
<value>268435456</value>
</property>
<property>
<name>hive.optimize.reducededuplication</name>
<value>true</value>
</property>
<property>
<name>hive.optimize.reducededuplication.min.reducer</name>
<value>4</value>
</property>
<property>
<name>hive.map.aggr</name>
<value>true</value>
</property>
<property>
<name>hive.map.aggr.hash.percentmemory</name>
<value>0.5</value>
</property>
<property>
<name>hive.optimize.sort.dynamic.partition</name>
<value>false</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>mr</value>
</property>
<property>
<name>spark.executor.memory</name>
<value>1369020825</value>
</property>
<property>
<name>spark.driver.memory</name>
<value>966367641</value>
</property>
<property>
<name>spark.executor.cores</name>
<value>1</value>
</property>
<property>
<name>spark.yarn.driver.memoryOverhead</name>
<value>102</value>
</property>
<property>
<name>spark.yarn.executor.memoryOverhead</name>
<value>230</value>
</property>
<property>
<name>spark.dynamicAllocation.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.dynamicAllocation.initialExecutors</name>
<value>1</value>
</property>
<property>
<name>spark.dynamicAllocation.minExecutors</name>
<value>1</value>
</property>
<property>
<name>spark.dynamicAllocation.maxExecutors</name>
<value>2147483647</value>
</property>
<property>
<name>hive.metastore.execute.setugi</name>
<value>true</value>
</property>
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<value>hadoop1,hadoop3,hadoop2</value>
</property>
<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
</property>
<property>
<name>hive.zookeeper.namespace</name>
<value>hive_zookeeper_namespace_hive</value>
</property>
<property>
<name>hive.cluster.delegation.token.store.class</name>
<value>org.apache.hadoop.hive.thrift.MemoryTokenStore</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
<property>
<name>hive.server2.use.SSL</name>
<value>false</value>
</property>
<property>
<name>spark.shuffle.service.enabled</name>
<value>true</value>
</property>
</configuration>
五.Flume开发
xz_bigdata_flume
FTP–>FlumeSource–>拦截器–>FlumeChannel–>FlumeSink–>Kafka
自定义的内容有:FlumeSource、拦截器、FlumeSink
1、maven冲突解决和pom.xml
1.1 安装Maven Helper插件,在Settings里面的Plugins里面搜索Maven Helper,点击Install,安装完毕。
1.2 ETL包括数据的抽取、转换、加载
①数据抽取:从源数据源系统抽取目的数据源系统需要的数据:
②数据转换:将从源数据源获取的数据按照业务需求,转换成目的数据源要求的形式,并对错误、不一致的数据进行清洗和加工;
③数据加载:将转换后的数据装载到目的数据源。
1.3 pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>xz_bigdata2</artifactId>
<groupId>com.hsiehchou</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>xz_bigdata_flume</artifactId>
<name>xz_bigdata_flume</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<flume-ng.version>1.6.0</flume-ng.version>
<hadoop.version>2.6.0</hadoop.version>
<jdom.version>1.0</jdom.version>
<c3p0.version>0.9.5</c3p0.version>
<hadoop.version>2.6.0</hadoop.version>
<mybatis.version>3.1.1</mybatis.version>
<zookeeper.version>3.4.6</zookeeper.version>
<net.sf.json.version>2.2.3</net.sf.json.version>
</properties>
<dependencies>
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_resources</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_common</artifactId>
<version>1.0-SNAPSHOT</version>
<exclusions>
<exclusion>
<artifactId>fastjson</artifactId>
<groupId>com.alibaba</groupId>
</exclusion>
<exclusion>
<artifactId>commons-configuration</artifactId>
<groupId>commons-configuration</groupId>
</exclusion>
<exclusion>
<artifactId>commons-io</artifactId>
<groupId>commons-io</groupId>
</exclusion>
<exclusion>
<artifactId>commons-lang</artifactId>
<groupId>commons-lang</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_kafka</artifactId>
<version>1.0-SNAPSHOT</version>
<exclusions>
<exclusion>
<artifactId>snappy-java</artifactId>
<groupId>org.xerial.snappy</groupId>
</exclusion>
<exclusion>
<artifactId>scala-library</artifactId>
<groupId>org.scala-lang</groupId>
</exclusion>
<exclusion>
<artifactId>zookeeper</artifactId>
<groupId>org.apache.zookeeper</groupId>
</exclusion>
<exclusion>
<artifactId>slf4j-api</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>log4j</artifactId>
<groupId>log4j</groupId>
</exclusion>
</exclusions>
</dependency>
<!--flume核心依赖-->
<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-core</artifactId>
<version>${flume-ng.version}-${cdh.version}</version>
<exclusions>
<exclusion>
<artifactId>slf4j-api</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>guava</artifactId>
<groupId>com.google.guava</groupId>
</exclusion>
<exclusion>
<artifactId>commons-codec</artifactId>
<groupId>commons-codec</groupId>
</exclusion>
<exclusion>
<artifactId>commons-logging</artifactId>
<groupId>commons-logging</groupId>
</exclusion>
<exclusion>
<artifactId>jetty</artifactId>
<groupId>org.mortbay.jetty</groupId>
</exclusion>
<exclusion>
<artifactId>jetty-util</artifactId>
<groupId>org.mortbay.jetty</groupId>
</exclusion>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>org.mortbay.jetty</groupId>
</exclusion>
<exclusion>
<artifactId>commons-io</artifactId>
<groupId>commons-io</groupId>
</exclusion>
<exclusion>
<artifactId>commons-lang</artifactId>
<groupId>commons-lang</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-sdk</artifactId>
<version>${flume-ng.version}-${cdh.version}</version>
</dependency>
<!--flume配置依赖-->
<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-configuration</artifactId>
<version>${flume-ng.version}-${cdh.version}</version>
<exclusions>
<exclusion>
<artifactId>guava</artifactId>
<groupId>com.google.guava</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>jdom</groupId>
<artifactId>jdom</artifactId>
<version>${jdom.version}</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>RELEASE</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>RELEASE</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>RELEASE</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>RELEASE</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>commons-configuration</groupId>
<artifactId>commons-configuration</artifactId>
<version>RELEASE</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>RELEASE</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>RELEASE</version>
<scope>compile</scope>
</dependency>
</dependencies>
<build>
<defaultGoal>compile</defaultGoal>
<sourceDirectory>src/main/java/</sourceDirectory>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<addClasspath>true</addClasspath>
<classpathPrefix>jars/</classpathPrefix>
<mainClass></mainClass>
</manifest>
</archive>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<id>copy</id>
<phase>install</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>
${project.build.directory}/jars
</outputDirectory>
<excludeArtifactIds>javaee-api</excludeArtifactIds>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-resources-plugin</artifactId>
<version>2.7</version>
<configuration>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
</plugins>
</build>
</project>
2、自定义source
2.1 继承AbstractSource 实现 Configurable, PollableSource接口
package com.hsiehchou.flume.source;
import com.hsiehchou.flume.constant.FlumeConfConstant;
import com.hsiehchou.flume.fields.MapFields;import com.hsiehchou.flume.utils.FileUtilsStronger;
import org.apache.commons.io.FileUtils;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.PollableSource;
import org.apache.flume.channel.ChannelProcessor;
import org.apache.flume.conf.Configurable;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.source.AbstractSource;
import org.apache.log4j.Logger;
import java.io.File;
import java.util.*;
/**
* 固定写法,自定义Source 直接继承 AbstractSource 和 实现 Configurable, PollableSource 接口
* 可参照官网 http://flume.apache.org/releases/content/1.9.0/FlumeDeveloperGuide.html#source
*/
public class FolderSource extends AbstractSource implements Configurable, PollableSource {
private final Logger logger = Logger.getLogger(FolderSource.class);
//tier1.sources.source1.sleeptime=5
//tier1.sources.source1.filenum=3000
//tier1.sources.source1.dirs =/usr/chl/data/filedir/
//tier1.sources.source1.successfile=/usr/chl/data/filedir_successful/
//以下为配置在flume.conf文件中
//读取的文件目录
private String dirStr;
//读取的文件目录,如果多个,以","分割,在flume.conf里面配置
private String[] dirs;
//处理成功的文件写入的目录
private String successfile;
//睡眠时间
private long sleeptime = 5;
//每批文件数量
private int filenum = 500;
//以下为配置在txtparse.properties文件中
//读取的所有文件集合
private Collection<File> allFiles;
//一批处理的文件大小
private List<File> listFiles;
private ArrayList<Event> eventList = new ArrayList<Event>();
/**
* @param context 拿到flume配置里面的所有参数
*/
@Override
public void configure(Context context) {
logger.info("开始初始化flume参数");
initFlumeParams(context);
logger.info("初始化flume参数成功");
}
@Override
public Status process() {
//定义处理逻辑
try {
Thread.currentThread().sleep(sleeptime * 1000);
} catch (InterruptedException e) {
logger.error(null, e);
}
Status status = null;
try {
// for (String dir : dirs) {
logger.info("dirStr===========" + dirStr);
//TODO 1.监控目录下面的所有文件
//读取目录下的文件,获取目录下所有以 "txt", "bcp" 结尾的文件
allFiles = FileUtils.listFiles(new File(dirStr), new String[]{"txt", "bcp"}, true);
//如果目录下文件总数大于阈值,则只取 filenum 个文件进行处理
if (allFiles.size() >= filenum) {
//文件数量大于3000 只取3000条
listFiles = ((List<File>) allFiles).subList(0, filenum);
} else {
//文件数量小于3000,取所有文件进行处理
listFiles = ((List<File>) allFiles);
}
//TODO 2.遍历所有的文件进行解析
if (listFiles.size() > 0) {
for (File file : listFiles) {
//文件名是需要传到channel中的
String fileName = file.getName();
//解析文件 获取文件名及文件内容 文件绝对路径 文件内容
Map<String, Object> stringObjectMap = FileUtilsStronger.parseFile(file, successfile);
//返回的内容2个参数 一个是文件绝对路径 另一个是lines文件的所有内容
//获取文件绝对路径
String absoluteFilename = (String) stringObjectMap.get(MapFields.ABSOLUTE_FILENAME);
//获取文件内容
List<String> lines = (List<String>) stringObjectMap.get(MapFields.VALUE);
//TODO 解析出来之后,需要把解析出来的数据封装为Event
if (lines != null && lines.size() > 0) {
//遍历读取的内容
for (String line : lines) {
//封装event Header 将文件名及文件绝对路径通过header传送到channel中
//构建event头
Map<String, String> map = new HashMap<String, String>();
//文件名
map.put(MapFields.FILENAME, fileName);
//文件绝对路径
map.put(MapFields.ABSOLUTE_FILENAME, absoluteFilename);
//构建event
SimpleEvent event = new SimpleEvent();
//把读取的一行数据转成字节
byte[] bytes = line.getBytes();
event.setBody(bytes);
event.setHeaders(map);
eventList.add(event);
}
}
try {
if (eventList.size() > 0) {
//获取channelProcessor
ChannelProcessor channelProcessor = getChannelProcessor();
//通过channelProcessor把eventList发送出去,可以通过拦截器进行拦截
channelProcessor.processEventBatch(eventList);
logger.info("批量推送到 拦截器 数据大小为" + eventList.size());
}
eventList.clear();
} catch (Exception e) {
eventList.clear();
logger.error("发送数据到channel失败", e);
} finally {
eventList.clear();
}
}
}
// 处理成功,返回成功状态
status = Status.READY;
return status;
} catch (Exception e) {
status = Status.BACKOFF;
logger.error("异常", e);
return status;
}
}
/**
* 初始化flume參數
* @param context
*/
public void initFlumeParams(Context context) {
//读取flume,conf配置文件,初始化参数
try {
//文件处理目录
//监控的文件目录
dirStr = context.getString(FlumeConfConstant.DIRS);
//监控多个目录
dirs = dirStr.split(",");
//成功处理的文件存放目录
successfile = context.getString(FlumeConfConstant.SUCCESSFILE);
//每批处理文件个数
filenum = context.getInteger(FlumeConfConstant.FILENUM);
//睡眠时间
sleeptime = context.getLong(FlumeConfConstant.SLEEPTIME);
logger.info("dirStr============" + dirStr);
logger.info("dirs==============" + dirs);
logger.info("successfile=======" + successfile);
logger.info("filenum===========" + filenum);
logger.info("sleeptime=========" + sleeptime);
} catch (Exception e) {
logger.error("初始化flume参数失败", e);
}
}
@Override
public long getBackOffSleepIncrement() {
return 0;
}
@Override
public long getMaxBackOffSleepInterval() {
return 0;
}
}
2.2 实现process()方法
此处代码已经在2.1里面,不用再写了
public Status process() {
//定义处理逻辑
try {
Thread.currentThread().sleep(sleeptime * 1000);
} catch (InterruptedException e) {
logger.error(null, e);
}
Status status = null;
try {
// for (String dir : dirs) {
logger.info("dirStr===========" + dirStr);
//TODO 1.监控目录下面的所有文件
//读取目录下的文件,获取目录下所有以 "txt", "bcp" 结尾的文件
allFiles = FileUtils.listFiles(new File(dirStr), new String[]{"txt", "bcp"}, true);
//如果目录下文件总数大于阈值,则只取 filenum 个文件进行处理
if (allFiles.size() >= filenum) {
//文件数量大于3000 只取3000条
listFiles = ((List<File>) allFiles).subList(0, filenum);
} else {
//文件数量小于3000,取所有文件进行处理
listFiles = ((List<File>) allFiles);
}
//TODO 2.遍历所有的文件进行解析
if (listFiles.size() > 0) {
for (File file : listFiles) {
//文件名是需要传到channel中的
String fileName = file.getName();
//解析文件 获取文件名及文件内容 文件绝对路径 文件内容
Map<String, Object> stringObjectMap = FileUtilsStronger.parseFile(file, successfile);
//返回的内容2个参数 一个是文件绝对路径 另一个是lines文件的所有内容
//获取文件绝对路径
String absoluteFilename = (String) stringObjectMap.get(MapFields.ABSOLUTE_FILENAME);
//获取文件内容
List<String> lines = (List<String>) stringObjectMap.get(MapFields.VALUE);
//TODO 解析出来之后,需要把解析出来的数据封装为Event
if (lines != null && lines.size() > 0) {
//遍历读取的内容
for (String line : lines) {
//封装event Header 将文件名及文件绝对路径通过header传送到channel中
//构建event头
Map<String, String> map = new HashMap<String, String>();
//文件名
map.put(MapFields.FILENAME, fileName);
//文件绝对路径
map.put(MapFields.ABSOLUTE_FILENAME, absoluteFilename);
//构建event
SimpleEvent event = new SimpleEvent();
//把读取的一行数据转成字节
byte[] bytes = line.getBytes();
event.setBody(bytes);
event.setHeaders(map);
eventList.add(event);
}
}
try {
if (eventList.size() > 0) {
//获取channelProcessor
ChannelProcessor channelProcessor = getChannelProcessor();
//通过channelProcessor把eventList发送出去,可以通过拦截器进行拦截
channelProcessor.processEventBatch(eventList);
logger.info("批量推送到 拦截器 数据大小为" + eventList.size());
}
eventList.clear();
} catch (Exception e) {
eventList.clear();
logger.error("发送数据到channel失败", e);
} finally {
eventList.clear();
}
}
}
// 处理成功,返回成功状态
status = Status.READY;
return status;
} catch (Exception e) {
status = Status.BACKOFF;
logger.error("异常", e);
return status;
}
}
source/MySource.java—Flume官网上的案例
package com.hsiehchou.flume.source;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.PollableSource;
import org.apache.flume.conf.Configurable;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.source.AbstractSource;
public class MySource extends AbstractSource implements Configurable, PollableSource {
private String myProp;
/**
* 配置读取
* @param context
*/
@Override
public void configure(Context context) {
String myProp = context.getString("myProp", "defaultValue");
// Process the myProp value (e.g. validation, convert to another type, ...)
// Store myProp for later retrieval by process() method
this.myProp = myProp;
}
/**
* 定义自己的业务逻辑
* @return
* @throws EventDeliveryException
*/
@Override
public Status process() throws EventDeliveryException {
Status status = null;
try {
// This try clause includes whatever Channel/Event operations you want to do
// Receive new data
//需要把自己的数据封装为event进行传输
Event e = new SimpleEvent();
// Store the Event into this Source's associated Channel(s)
getChannelProcessor().processEvent(e);
status = Status.READY;
} catch (Throwable t) {
// Log exception, handle individual exceptions as needed
status = Status.BACKOFF;
// re-throw all Errors
if (t instanceof Error) {
throw (Error)t;
}
} finally {
}
return status;
}
@Override
public long getBackOffSleepIncrement() {
return 0;
}
@Override
public long getMaxBackOffSleepInterval() {
return 0;
}
@Override
public void start() {
// Initialize the connection to the external client
}
@Override
public void stop () {
// Disconnect from external client and do any additional cleanup
// (e.g. releasing resources or nulling-out field values) ..
}
}
3、自定义interceptor—数据清洗过滤器
3.1实现Interceptor 接口
package com.hsiehchou.flume.interceptor;
import com.alibaba.fastjson.JSON;
import com.hsiehchou.flume.fields.MapFields;
import com.hsiehchou.flume.service.DataCheck;
import org.apache.commons.io.Charsets;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.interceptor.Interceptor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
/**
* 数据清洗过滤器
*/
public class DataCleanInterceptor implements Interceptor {
private static final Logger LOG = LoggerFactory.getLogger(DataCleanInterceptor.class);
//datatpye.properties
//private static Map<String,ArrayList<String>> dataMap = DataTypeProperties.dataTypeMap;
/**
* 初始化
*/
@Override
public void initialize() {
}
/**
* 单条处理
* 拦截方法。数据解析,封装,数据清洗
* @param event
* @return
*/
@Override
public Event intercept(Event event) {
SimpleEvent eventNew = new SimpleEvent();
try {
LOG.info("拦截器Event开始执行");
Map<String, String> map = parseEvent(event);
if(map == null){
return null;
}
String lineJson = JSON.toJSONString(map);
LOG.info("拦截器推送数据到channel:" +lineJson);
eventNew.setBody(lineJson.getBytes());
} catch (Exception e) {
LOG.error(null,e);
}
return eventNew;
}
/**
* 批处理
* @param events
* @return
*/
@Override
public List<Event> intercept(List<Event> events) {
List<Event> list = new ArrayList<Event>();
for (Event event : events) {
Event intercept = intercept(event);
if (intercept != null) {
list.add(intercept);
}
}
return list;
}
@Override
public void close() {
}
/**
* 数据解析
* @param event
* @return
*/
public static Map<String,String> parseEvent(Event event){
if (event == null) {
return null;
}
//000000000000000 000000000000000 24.000000 25.000000 aa-aa-aa-aa-aa-aa bb-bb-bb-bb-bb-bb 32109231 1557305985 andiy 18609765432 judy 1789098763
String line = new String(event.getBody(), Charsets.UTF_8);
//文件名 和 文件绝对路径
String filename = event.getHeaders().get(MapFields.FILENAME);
String absoluteFilename = event.getHeaders().get(MapFields.ABSOLUTE_FILENAME);
//String转map,进行数据校验,检验错误入ES错误表
Map<String, String> map = DataCheck.txtParseAndalidation(line,filename,absoluteFilename);
return map;
//wechat_source1_1111115.txt
//String[] fileNames = filename.split("_");
// String转map,并进行数据长度校验,校验错误入ES错误表
//Map<String, String> map = JZDataCheck.txtParse(type, line, source, filename,absoluteFilename);
//Map<String,String> map = new HashMap<>();
//000000000000000 000000000000000 24.000000 25.000000 aa-aa-aa-aa-aa-aa bb-bb-bb-bb-bb-bb 32109231 1557305985 andiy 18609765432 judy 1789098763
//String[] split = line.split("\t");
//数据类别
//String dataType = fileNames[0];
//imei,imsi,longitude,latitude,phone_mac,device_mac,device_number,collect_time,username,phone,object_username,send_message,accept_message,message_time
//ArrayList<String> fields = dataMap.get(dataType);
//for (int i = 0; i < split.length; i++) {
// map.put(fields.get(i),split[i]);
//}
//添加ID
//map.put(MapFields.ID, UUID.randomUUID().toString().replace("-",""));
// map.put(MapFields.TABLE, dataType);
// map.put(MapFields.FILENAME, filename);
// map.put(MapFields.ABSOLUTE_FILENAME, absoluteFilename);
// Map<String, String> map = DataCheck.txtParseAndalidation(line,filename,absoluteFilename);
// return map;
}
/**
* 实例化创建
*/
public static class Builder implements Interceptor.Builder {
@Override
public void configure(Context context) {
}
@Override
public Interceptor build() {
return new DataCleanInterceptor();
}
}
}
4、utils工具类
utils/FileUtilsStronger.java
package com.hsiehchou.flume.utils;
import com.hsiehchou.common.time.TimeTranstationUtils;
import com.hsiehchou.flume.fields.MapFields;
import org.apache.commons.io.FileUtils;
import org.apache.log4j.Logger;
import java.io.File;
import java.util.*;
import static java.io.File.separator;
public class FileUtilsStronger {
private static final Logger logger = Logger.getLogger(FileUtilsStronger.class);
/**
* @param file
* @param path
*/
public static Map<String,Object> parseFile(File file, String path) {
Map<String,Object> map=new HashMap<String,Object>();
List<String> lines;
String fileNew = path+ TimeTranstationUtils.Date2yyyy_MM_dd()+getDir(file);
try {
if((new File(fileNew+file.getName())).exists()){
try{
logger.info("文件名已经存在,开始删除同名已经存在文件"+file.getAbsolutePath());
file.delete();
logger.info("删除同名已经存在文件"+file.getAbsolutePath()+"成功");
}catch (Exception e){
logger.error("删除同名已经存在文件"+file.getAbsolutePath()+"失败",e);
}
}else{
lines = FileUtils.readLines(file);
map.put(MapFields.ABSOLUTE_FILENAME,fileNew+file.getName());
map.put(MapFields.VALUE,lines);
FileUtils.moveToDirectory(file, new File(fileNew), true);
logger.info("移动文件到"+file.getAbsolutePath()+"到"+fileNew+"成功");
}
} catch (Exception e) {
logger.error("移动文件" + file.getAbsolutePath() + "到" + fileNew + "失败", e);
}
return map;
}
/**
* @param file
* @param path
*/
public static List<String> chanmodName(File file, String path) {
List<String> lines=null;
try {
if((new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"/"+file.getName())).exists()){
logger.warn("文件名已经存在,开始删除同名文件" +path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"/"+file.getName());
try{
file.delete();
logger.warn("删除同名文件"+file.getAbsolutePath()+"成功");
}catch (Exception e){
logger.warn("删除同名文件"+file.getAbsolutePath()+"失败",e);
}
}else{
lines = FileUtils.readLines(file);
FileUtils.moveToDirectory(file, new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()), true);
logger.info("移动文件到"+file.getAbsolutePath()+"到"+path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"成功");
}
} catch (Exception e) {
logger.error("移动文件" + file.getName() + "到" + path+ TimeTranstationUtils.Date2yyyy_MM_dd() + "失败", e);
}
return lines;
}
/**
* @param file
* @param path
*/
public static void moveFile2unmanage(File file, String path) {
try {
if((new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"/"+file.getName())).exists()){
logger.warn("文件名已经存在,开始删除同名文件" +file.getAbsolutePath());
try{
file.delete();
logger.warn("删除同名文件"+file.getAbsolutePath()+"成功");
}catch (Exception e){
logger.warn("删除同名文件"+file.getAbsolutePath()+"失败",e);
}
}else{
FileUtils.moveToDirectory(file, new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()), true);
//logger.info("移动文件到"+file.getAbsolutePath()+"到"+path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"成功");
}
} catch (Exception e) {
logger.error("移动错误文件" + file.getName() + "到" + path+ TimeTranstationUtils.Date2yyyy_MM_dd() + "失败", e);
}
}
/**
* @param file
* @param path
*/
public static void shnegtingChanmodName(File file, String path) {
try {
if((new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"/"+file.getName())).exists()){
logger.warn("文件名已经存在,开始删除同名文件" +path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"/"+file.getName());
try{
file.delete();
logger.warn("删除同名文件"+file.getAbsolutePath()+"成功");
}catch (Exception e){
logger.warn("删除同名文件"+file.getAbsolutePath()+"失败",e);
}
}else{
FileUtils.moveToDirectory(file, new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()), true);
logger.info("移动文件到"+file.getAbsolutePath()+"到"+path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"成功");
}
} catch (Exception e) {
logger.error("移动文件" + file.getName() + "到" + path+ TimeTranstationUtils.Date2yyyy_MM_dd() + "失败", e);
}
}
/**
* 获取文件父目录
* @param file
* @return
*/
public static String getDir(File file){
String dir=file.getParent();
StringTokenizer dirs = new StringTokenizer(dir, separator);
List<String> list=new ArrayList<String>();
while(dirs.hasMoreTokens()){
list.add((String)dirs.nextElement());
}
String str="";
for(int i=2;i<list.size();i++){
str=str+separator+list.get(i);
}
return str+"/";
}
}
utils/Validation.java—验证工具类
package com.hsiehchou.flume.utils;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* 验证工具类
*/
@Deprecated
public class Validation {
// ------------------常量定义
/**
* Email正则表达式=
* "^([a-z0-9A-Z]+[-|\\.]?)+[a-z0-9A-Z]@([a-z0-9A-Z]+(-[a-z0-9A-Z]+)?\\.)+[a-zA-Z]{2,}$"
* ;
*/
// public static final String EMAIL =
// "^([a-z0-9A-Z]+[-|\\.]?)+[a-z0-9A-Z]@([a-z0-9A-Z]+(-[a-z0-9A-Z]+)?\\.)+[a-zA-Z]{2,}$";;
public static final String EMAIL = "\\w+(\\.\\w+)*@\\w+(\\.\\w+)+";
/**
* 电话号码正则表达式=
* (^(\d{2,4}[-_-—]?)?\d{3,8}([-_-—]?\d{3,8})?([-_-—]?\d{1,7})?$)|
* (^0?1[35]\d{9}$)
*/
public static final String PHONE = "(^(\\d{2,4}[-_-—]?)?\\d{3,8}([-_-—]?\\d{3,8})?([-_-—]?\\d{1,7})?$)|(^0?1[35]\\d{9}$)";
/**
* 手机号码正则表达式=^(13[0-9]|15[0-9]|18[0-9])\d{8}$
*/
public static final String MOBILE = "^((13[0-9])|(14[5-7])|(15[^4])|(17[0-8])|(18[0-9]))\\d{8}$";
/**
* Integer正则表达式 ^-?(([1-9]\d*$)|0)
*/
public static final String INTEGER = "^-?(([1-9]\\d*$)|0)";
/**
* 正整数正则表达式 >=0 ^[1-9]\d*|0$
*/
public static final String INTEGER_NEGATIVE = "^[1-9]\\d*|0$";
/**
* 负整数正则表达式 <=0 ^-[1-9]\d*|0$
*/
public static final String INTEGER_POSITIVE = "^-[1-9]\\d*|0$";
/**
* Double正则表达式 ^-?([1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0)$
*/
public static final String DOUBLE = "^-?([1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*|0?\\.0+|0)$";
/**
* 正Double正则表达式 >=0 ^[1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0$
*/
public static final String DOUBLE_NEGATIVE = "^[1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*|0?\\.0+|0$";
/**
* 负Double正则表达式 <= 0 ^(-([1-9]\d*\.\d*|0\.\d*[1-9]\d*))|0?\.0+|0$
*/
public static final String DOUBLE_POSITIVE = "^(-([1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*))|0?\\.0+|0$";
/**
* 年龄正则表达式 ^(?:[1-9][0-9]?|1[01][0-9]|120)$ 匹配0-120岁
*/
public static final String AGE = "^(?:[1-9][0-9]?|1[01][0-9]|120)$";
/**
* 邮编正则表达式 [0-9]\d{5}(?!\d) 国内6位邮编
*/
public static final String CODE = "[0-9]\\d{5}(?!\\d)";
/**
* 匹配由数字、26个英文字母或者下划线组成的字符串 ^\w+$
*/
public static final String STR_ENG_NUM_ = "^\\w+$";
/**
* 匹配由数字和26个英文字母组成的字符串 ^[A-Za-z0-9]+$
*/
public static final String STR_ENG_NUM = "^[A-Za-z0-9]+";
/**
* 匹配由26个英文字母组成的字符串 ^[A-Za-z]+$
*/
public static final String STR_ENG = "^[A-Za-z]+$";
/**
* 过滤特殊字符串正则 regEx=
* "[`~!@#$%^&*()+=|{}':;',\\[\\].<>/?~!@#¥%……&*()——+|{}【】‘;:”“’。,、?]";
*/
public static final String STR_SPECIAL = "[`~!@#$%^&*()+=|{}':;',\\[\\].<>/?~!@#¥%……&*()——+|{}【】‘;:”“’。,、?]";
/***
* 日期正则 支持: YYYY-MM-DD YYYY/MM/DD YYYY_MM_DD YYYYMMDD YYYY.MM.DD的形式
*/
public static final String DATE_ALL = "((^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(10|12|0?[13578])([-\\/\\._]?)(3[01]|[12][0-9]|0?[1-9])$)"
+ "|(^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(11|0?[469])([-\\/\\._]?)(30|[12][0-9]|0?[1-9])$)"
+ "|(^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(0?2)([-\\/\\._]?)(2[0-8]|1[0-9]|0?[1-9])$)|(^([2468][048]00)([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([3579][26]00)"
+ "([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)"
+ "|(^([1][89][0][48])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([2-9][0-9][0][48])([-\\/\\._]?)"
+ "(0?2)([-\\/\\._]?)(29)$)"
+ "|(^([1][89][2468][048])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([2-9][0-9][2468][048])([-\\/\\._]?)(0?2)"
+ "([-\\/\\._]?)(29)$)|(^([1][89][13579][26])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|"
+ "(^([2-9][0-9][13579][26])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$))";
/***
* 日期正则 支持: YYYY-MM-DD
*/
public static final String DATE_FORMAT1 = "(([0-9]{3}[1-9]|[0-9]{2}[1-9][0-9]{1}|[0-9]{1}[1-9][0-9]{2}|[1-9][0-9]{3})-(((0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01]))|((0[469]|11)-(0[1-9]|[12][0-9]|30))|(02-(0[1-9]|[1][0-9]|2[0-8]))))|((([0-9]{2})(0[48]|[2468][048]|[13579][26])|((0[48]|[2468][048]|[3579][26])00))-02-29)";
/**
* URL正则表达式 匹配 http www ftp
*/
public static final String URL = "^(http|www|ftp|)?(://)?(\\w+(-\\w+)*)(\\.(\\w+(-\\w+)*))*((:\\d+)?)(/(\\w+(-\\w+)*))*(\\.?(\\w)*)(\\?)?"
+ "(((\\w*%)*(\\w*\\?)*(\\w*:)*(\\w*\\+)*(\\w*\\.)*(\\w*&)*(\\w*-)*(\\w*=)*(\\w*%)*(\\w*\\?)*"
+ "(\\w*:)*(\\w*\\+)*(\\w*\\.)*"
+ "(\\w*&)*(\\w*-)*(\\w*=)*)*(\\w*)*)$";
/**
* 身份证正则表达式
*/
public static final String IDCARD = "((11|12|13|14|15|21|22|23|31|32|33|34|35|36|37|41|42|43|44|45|46|50|51|52|53|54|61|62|63|64|65)[0-9]{4})"
+ "(([1|2][0-9]{3}[0|1][0-9][0-3][0-9][0-9]{3}"
+ "[Xx0-9])|([0-9]{2}[0|1][0-9][0-3][0-9][0-9]{3}))";
/**
* 机构代码
*/
public static final String JIGOU_CODE = "^[A-Z0-9]{8}-[A-Z0-9]$";
/**
* 匹配数字组成的字符串 ^[0-9]+$
*/
public static final String STR_NUM = "^[0-9]+$";
// //------------------验证方法
/**
* 判断字段是否为空 符合返回ture
* @param str
* @return boolean
*/
public static synchronized boolean StrisNull(String str) {
return null == str || str.trim().length() <= 0 ? true : false;
}
/**
* 判断字段是非空 符合返回ture
* @param str
* @return boolean
*/
public static boolean StrNotNull(String str) {
return !StrisNull(str);
}
/**
* 字符串null转空
* @param str
* @return boolean
*/
public static String nulltoStr(String str) {
return StrisNull(str) ? "" : str;
}
/**
* 字符串null赋值默认值
* @param str 目标字符串
* @param defaut 默认值
* @return String
*/
public static String nulltoStr(String str, String defaut) {
return StrisNull(str) ? defaut : str;
}
/**
* 判断字段是否为Email 符合返回ture
* @param str
* @return boolean
*/
public static boolean isEmail(String str) {
return Regular(str, EMAIL);
}
/**
* 判断是否为电话号码 符合返回ture
* @param str
* @return boolean
*/
public static boolean isPhone(String str) {
return Regular(str, PHONE);
}
/**
* 判断是否为手机号码 符合返回ture
* @param str
* @return boolean
*/
public static boolean isMobile(String str) {
return RegularSJHM(str, MOBILE);
}
/**
* 判断是否为Url 符合返回ture
* @param str
* @return boolean
*/
public static boolean isUrl(String str) {
return Regular(str, URL);
}
/**
* 判断字段是否为数字 正负整数 正负浮点数 符合返回ture
* @param str
* @return boolean
*/
public static boolean isNumber(String str) {
return Regular(str, DOUBLE);
}
/**
* 判断字段是否为INTEGER 符合返回ture
* @param str
* @return boolean
*/
public static boolean isInteger(String str) {
return Regular(str, INTEGER);
}
/**
* 判断字段是否为正整数正则表达式 >=0 符合返回ture
* @param str
* @return boolean
*/
public static boolean isINTEGER_NEGATIVE(String str) {
return Regular(str, INTEGER_NEGATIVE);
}
/**
* 判断字段是否为负整数正则表达式 <=0 符合返回ture
* @param str
* @return boolean
*/
public static boolean isINTEGER_POSITIVE(String str) {
return Regular(str, INTEGER_POSITIVE);
}
/**
* 判断字段是否为DOUBLE 符合返回ture
* @param str
* @return boolean
*/
public static boolean isDouble(String str) {
return Regular(str, DOUBLE);
}
/**
* 判断字段是否为正浮点数正则表达式 >=0 符合返回ture
* @param str
* @return boolean
*/
public static boolean isDOUBLE_NEGATIVE(String str) {
return Regular(str, DOUBLE_NEGATIVE);
}
/**
* 判断字段是否为负浮点数正则表达式 <=0 符合返回ture
* @param str
* @return boolean
*/
public static boolean isDOUBLE_POSITIVE(String str) {
return Regular(str, DOUBLE_POSITIVE);
}
/**
* 判断字段是否为日期 符合返回ture
* @param str
* @return boolean
*/
public static boolean isDate(String str) {
return Regular(str, DATE_ALL);
}
/**
* 验证
* @param str
* @return
*/
public static boolean isDate1(String str) {
return Regular(str, DATE_FORMAT1);
}
/**
* 判断字段是否为年龄 符合返回ture
* @param str
* @return boolean
*/
public static boolean isAge(String str) {
return Regular(str, AGE);
}
/**
* 判断字段是否超长 字串为空返回fasle, 超过长度{leng}返回ture 反之返回false
* @param str
* @param leng
* @return boolean
*/
public static boolean isLengOut(String str, int leng) {
return StrisNull(str) ? false : str.trim().length() > leng;
}
/**
* 判断字段是否为身份证 符合返回ture
* @param str
* @return boolean
*/
public static boolean isIdCard(String str) {
if (StrisNull(str))
return false;
if (str.trim().length() == 15 || str.trim().length() == 18) {
return Regular(str, IDCARD);
} else {
return false;
}
}
/**
* 判断字段是否为邮编 符合返回ture
* @param str
* @return boolean
*/
public static boolean isCode(String str) {
return Regular(str, CODE);
}
/**
* 判断字符串是不是全部是英文字母
* @param str
* @return boolean
*/
public static boolean isEnglish(String str) {
return Regular(str, STR_ENG);
}
/**
* 判断字符串是不是全部是英文字母+数字
* @param str
* @return boolean
*/
public static boolean isENG_NUM(String str) {
return Regular(str, STR_ENG_NUM);
}
/**
* 判断字符串是不是全部是英文字母+数字+下划线
* @param str
* @return boolean
*/
public static boolean isENG_NUM_(String str) {
return Regular(str, STR_ENG_NUM_);
}
/**
* 过滤特殊字符串 返回过滤后的字符串
* @param str
* @return boolean
*/
public static String filterStr(String str) {
Pattern p = Pattern.compile(STR_SPECIAL);
Matcher m = p.matcher(str);
return m.replaceAll("").trim();
}
/**
* 校验机构代码格式
* @return
*/
public static boolean isJigouCode(String str) {
return Regular(str, JIGOU_CODE);
}
/**
* 判断字符串是不是数字组成
* @param str
* @return boolean
*/
public static boolean isSTR_NUM(String str) {
return Regular(str, STR_NUM);
}
/**
* 匹配是否符合正则表达式pattern 匹配返回true
* @param str 匹配的字符串
* @param pattern 匹配模式
* @return boolean
*/
private static boolean Regular(String str, String pattern) {
if (null == str || str.trim().length() <= 0)
return false;
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(str);
return m.matches();
}
/**
* 匹配是否符合正则表达式pattern 匹配返回true
* @param str 匹配的字符串
* @param pattern 匹配模式
* @return boolean
*/
private static boolean RegularSJHM(String str, String pattern) {
if (null == str || str.trim().length() <= 0){
return false;
}
if(str.contains("+86")){
str=str.replace("+86","");
}
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(str);
return m.matches();
}
/**
* description:匹配yyyyMMddHHmmss格式时间
* @param time
* @return boolean
*/
public static final String yyyyMMddHHmmss = "[0-9]{14}";
public static boolean isyyyyMMddHHmmss(String time) {
if (time == null) {
return false;
}
boolean bool = time.matches(yyyyMMddHHmmss);
return bool;
}
/**
* description:匹配yyyyMMddHHmmss格式时间
* @param time
* @return boolean
*/
public static final String isMac = "^[A-Fa-f0-9]{2}(-[A-Fa-f0-9]{2}){5}$";
public static boolean isMac(String mac) {
if (mac == null) {
return false;
}
boolean bool = mac.matches(isMac);
return bool;
}
/**
* description:匹配yyyyMMddHHmmss格式时间
* @param time
* @return boolean
*/
public static final String longtime = "[0-9]{10}";
public static boolean isTimestamp(String timestamp) {
if (timestamp == null) {
return false;
}
boolean bool = timestamp.matches(longtime);
return bool;
}
/**
* 判断字段是否为datatype 符合返回ture
* @param str
* @return boolean
*/
public static final String DATATYPE = "^\\d{7}$";
public static boolean isDATATYPE(String str) {
return Regular(str, DATATYPE);
}
/**
* 判断字段是否为QQ 符合返回ture
* @param str
* @return boolean
*/
public static final String QQ = "^\\d{5,15}$";
public static boolean isQQ(String str) {
return Regular(str, QQ);
}
/**
* 判断字段是否为IMSI 符合返回ture
* @param str
* @return boolean
*/
public static final String IMSI = "^4600[0,1,2,3,4,5,6,7,9]\\d{10}|(46011|46020)\\d{10}$";
public static boolean isIMSI(String str) {
return Regular(str, IMSI);
}
/**
* 判断字段是否为IMEI 符合返回ture
* @param str
* @return boolean
*/
public static final String IMEI = "^\\d{8}$|^[a-fA-F0-9]{14}$|^\\d{15}$";
public static boolean isIMEI(String str) {return Regular(str, IMEI);}
/**
* 判断字段是否为CAPTURETIME 符合返回ture
* @param str
* @return boolean
*/
public static final String CAPTURETIME = "^\\d{10}|(20[0-9][0-9])\\d{10}$";
public static boolean isCAPTURETIME(String str) {return Regular(str, CAPTURETIME);}
/**
* description:检测认证类型
* @param auth
* @return boolean
*/
public static final String AUTH_TYPE = "^\\d{7}$";
public static boolean isAUTH_TYPE(String str) {return Regular(str, CAPTURETIME);}
/**
* description:检测FIRM_CODE
* @param auth
* @return boolean
*/
public static final String FIRM_CODE = "^\\d{9}$";
public static boolean isFIRM_CODE(String str) {return Regular(str, FIRM_CODE);}
/**
* description:检测经度
* @param auth
* @return boolean
*/
public static final String LONGITUDE = "^-?(([1-9]\\d?)|(1[0-7]\\d)|180)(\\.\\d{1,6})?$";
//public static final String LONGITUDE ="^([-]?(\\d|([1-9]\\d)|(1[0-7]\\d)|(180))(\\.\\d*)\\,[-]?(\\d|([1-8]\\d)|(90))(\\.\\d*))$";
public static boolean isLONGITUDE(String str) {return Regular(str, LONGITUDE);}
/**
* description:检测纬度
*
* @param auth
* @return boolean 2016-7-19 下午4:50:06 by
*/
public static final String LATITUDE = "^-?(([1-8]\\d?)|([1-8]\\d)|90)(\\.\\d{1,6})?$";
public static boolean isLATITUDE(String str) {return Regular(str, LATITUDE);}
public static void main(String[] args) {
boolean bool = isLATITUDE("25.546685");
System.out.println(bool);
}
}
5、constant常量
constant/FlumeConfConstant.java
package com.hsiehchou.flume.constant;
public class FlumeConfConstant {
//flumeSource配置
public static final String UNMANAGE="unmanage";
public static final String DIRS="dirs";
public static final String SUCCESSFILE="successfile";
public static final String ALL="all";
public static final String SOURCE="source";
public static final String FILENUM="filenum";
public static final String SLEEPTIME="sleeptime";
//ESSINK配置
public static final String TIMECELL="timecell";
public static final String MAXNUM="maxnum";
public static final String SINK_SOURCE="source";
public static final String THREADNUM="threadnum";
public static final String REDISHOST="redishost";
}
constant/TxtConstant.java
package com.hsiehchou.flume.constant;
public class TxtConstant {
public static final String TYPE_ES="TYPE_ES";
public static final String STATIONCENTER="STATIONCENTER";
public static final String APCENTER="APCENTER";
public static final String IPLOGINLOG="IPLOGINLOG";
public static final String IMSIIMEI="IMSIIMEI";
public static final String MACHOUR="MACHOUR";
public static final String TYPE_SITEMANAGE="TYPE_SITEMANAGE";
public static final String JZWA="JZWA";
public static final String FIRMCODE="FIRMCODE";
public static final String FILENAME_FIELDS1="FILENAME_FIELDS1";
public static final String FILENAME_FIELDS2="FILENAME_FIELDS2";
public static final String FILENAME_FIELDS3="FILENAME_FIELDS3";
public static final String FILENAME_FIELDS4="FILENAME_FIELDS4";
public static final String FILENAME_FIELDS5="FILENAME_FIELDS5";
public static final String FILENAME_VALIDATION="FILENAME_VALIDATION";
public static final String AUTHTYPE_LIST="AUTHTYPE_LIST";
public static final String SOURCE_FEIJING="SOURCE_FEIJING";
public static final String SOURCE_650="SOURCE_650";
public static final String OFFICE_11="OFFICE_11";
public static final String OFFICE_12="OFFICE_12";
public static final String WLZK="WLZK";
public static final String FEIJING="FEIJING";
public static final String HLWZC="HLWZC";
public static final String WIFIWL="WIFIWL";
// 错误索引
public static final String ERROR_INDEX="es.errorindex";
public static final String ERROR_TYPE="es.errortype";
//WIFI索引
public static final String WIFILOG_INDEX="es.index.wifilog";
public static final String IPLOGINLOG_TYPE="es.type.iploginlog";
public static final String EMAIL_TYPE="es.type.email";
public static final String FTP_TYPE="es.type.ftp";
public static final String GAME_TYPE="es.type.game";
public static final String HEARTBEAT_TYPE="es.type.heartbeat";
public static final String HTTP_TYPE="es.type.http";
public static final String IMINFO_TYPE="es.type.iminfo";
public static final String ORGANIZATION_TYPE="es.type.organization";
public static final String SEARCH_TYPE="es.type.search";
public static final String IMSIIMEI_TYPE="es.type.imsiimei";
}
6、field字段
field/ErrorMapFields.java
package com.hsiehchou.flume.fields;
public class ErrorMapFields {
public static final String RKSJ="RKSJ";
public static final String RECORD="RECORD";
public static final String LENGTH="LENGTH";
public static final String LENGTH_ERROR="LENGTH_ERROR";
public static final String LENGTH_ERROR_NUM="10001";
public static final String FILENAME="FILENAME";
public static final String FILENAME_ERROR="FILENAME_ERROR";
public static final String FILENAME_ERROR_NUM="10010";
public static final String ABSOLUTE_FILENAME="ABSOLUTE_FILENAME";
public static final String SJHM="SJHM";
public static final String SJHM_ERROR="SJHM_ERROR";
public static final String SJHM_ERRORCODE="10007";
public static final String DATA_TYPE="DATA_TYPE";
public static final String DATA_TYPE_ERROR="DATA_TYPE_ERROR";
public static final String DATA_TYPE_ERRORCODE="10011";
public static final String QQ="QQ";
public static final String QQ_ERROR="QQ_ERROR";
public static final String QQ_ERRORCODE="10002";
public static final String IMSI="IMSI";
public static final String IMSI_ERROR="IMSI_ERROR";
public static final String IMSI_ERRORCODE="10005";
public static final String IMEI="IMEI";
public static final String IMEI_ERROR="IMEI_ERROR";
public static final String IMEI_ERRORCODE="10006";
public static final String MAC="MAC";
public static final String CLIENTMAC="CLIENTMAC";
public static final String STATIONMAC="STATIONMAC";
public static final String BSSID="BSSID";
public static final String MAC_ERROR="MAC_ERROR";
public static final String MAC_ERRORCODE="10003";
public static final String DEVICENUM="DEVICENUM";
public static final String DEVICENUM_ERROR="DEVICENUM_ERROR";
public static final String DEVICENUM_ERRORCODE="10014";
public static final String CAPTURETIME="CAPTURETIME";
public static final String CAPTURETIME_ERROR="CAPTURETIME_ERROR";
public static final String CAPTURETIME_ERRORCODE="10019";
public static final String EMAIL="EMAIL";
public static final String EMAIL_ERROR="EMAIL_ERROR";
public static final String EMAIL_ERRORCODE="10004";
public static final String AUTH_TYPE="AUTH_TYPE";
public static final String AUTH_TYPE_ERROR="AUTH_TYPE_ERROR";
public static final String AUTH_TYPE_ERRORCODE="10020";
public static final String FIRM_CODE="FIRM_CODE";
public static final String FIRMCODE_NUM="FIRMCODE_NUM";
public static final String FIRM_CODE_ERROR="FIRM_CODE_ERROR";
public static final String FIRM_CODE_ERRORCODE="10009";
public static final String STARTTIME="STARTTIME";
public static final String STARTTIME_ERROR="STARTTIME_ERROR";
public static final String STARTTIME_ERRORCODE="10015";
public static final String ENDTIME="ENDTIME";
public static final String ENDTIME_ERROR="ENDTIME_ERROR";
public static final String ENDTIME_ERRORCODE="10016";
public static final String LOGINTIME="LOGINTIME";
public static final String LOGINTIME_ERROR="LOGINTIME_ERROR";
public static final String LOGINTIME_ERRORCODE="10017";
public static final String LOGOUTTIME="LOGOUTTIME";
public static final String LOGOUTTIME_ERROR="LOGOUTTIME_ERROR";
public static final String LOGOUTTIME_ERRORCODE="10018";
public static final String LONGITUDE="LONGITUDE";
public static final String LONGITUDE_ERROR="LONGITUDE_ERROR";
public static final String LONGITUDE_ERRORCODE="10012";
public static final String LATITUDE="LATITUDE";
public static final String LATITUDE_ERROR="LATITUDE_ERROR";
public static final String LATITUDE_ERRORCODE="10013";
//TODO 其他类型DATA_TYPE 记录
public static final String DATA_TYPE_OTHER="DATA_TYPE_OTHER";
public static final String DATA_TYPE_OTHER_ERROR="DATA_TYPE_OTHER_ERROR";
public static final String DATA_TYPE_OTHER_ERRORCODE="10022";
//TODO USERNAME 错误
public static final String USERNAME="USERNAME";
public static final String USERNAME_ERROR="USERNAME_ERROR";
public static final String USERNAME_ERRORCODE="10023";
}
field/MapFields.java
package com.hsiehchou.flume.fields;
public class MapFields {
public static final String ID="id";
public static final String SOURCE="source";
public static final String TYPE="TYPE";
public static final String TABLE="table";
public static final String FILENAME="filename";
public static final String RKSJ="rksj";
public static final String ABSOLUTE_FILENAME="absolute_filename";
public static final String BSSID="BSSID";
public static final String USERNAME="USERNAME";
public static final String DAYID="DAYID";
public static final String FIRMCODE_NUM="FIRMCODE_NUM";
public static final String FIRM_CODE="FIRM_CODE";
public static final String IMEI="IMEI";
public static final String IMSI="IMSI";
public static final String DATA_TYPE_NAME="DATA_TYPE_NAME";
public static final String AUTH_TYPE="AUTH_TYPE";
public static final String AUTH_ACCOUNT="AUTH_ACCOUNT";
//TODO 时间类参数
public static final String CAPTURETIME="CAPTURETIME";
public static final String LOGINTIME="LOGINTIME";
public static final String LOGOUTTIME="LOGOUTTIME";
public static final String STARTTIME="STARTTIME";
public static final String ENDTIME="ENDTIME";
public static final String FIRSTTIME="FIRSTTIME";
public static final String LASTTIME="LASTTIME";
//TODO 去重参数
public static final String COUNT="COUNT";
public static final String DATA_TYPE="DATA_TYPE";
public static final String VALUE="value";
public static final String SITECODE="SITECODE";
public static final String SITECODENEW="SITECODENEW";
public static final String DEVICENUM="DEVICENUM";
public static final String MAC="MAC";
public static final String CLIENTMAC="CLIENTMAC";
public static final String STATIONMAC="STATIONMAC";
public static final String BRAND="BRAND";
public static final String INDEX="INDEX";
public static final String ACTION_TYPE="ACTION_TYPE";
public static final String CITY_CODE="CITY_CODE";
/* public static final String FILENAME_FIELDS1="FILENAME_FIELDS1";
public static final String FILENAME_FIELDS1="FILENAME_FIELDS1";
public static final String FILENAME_FIELDS1="FILENAME_FIELDS1";
public static final String FILENAME_FIELDS1="FILENAME_FIELDS1";*/
}
7、自定义sink
sink/KafkaSink.java—将数据下沉到kafka
package com.hsiehchou.flume.sink;
import com.google.common.base.Throwables;
import com.hsiehchou.kafka.producer.StringProducer;
import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;
import org.apache.log4j.Logger;
import java.util.ArrayList;
import java.util.List;
public class KafkaSink extends AbstractSink implements Configurable {
private final Logger logger = Logger.getLogger(KafkaSink.class);
private String[] kafkatopics = null;
//private List<KeyedMessage<String,String>> listKeyedMessage=null;
private List<String> listKeyedMessage=null;
private Long proTimestamp=System.currentTimeMillis();
/**
* 配置读取
* @param context
*/
@Override
public void configure(Context context) {
//tier1.sinks.sink1.kafkatopic=chl_test7
//获取 推送kafkatopic参数
kafkatopics = context.getString("kafkatopics").split(",");
logger.info("获取kafka topic配置" + context.getString("kafkatopics"));
listKeyedMessage=new ArrayList<>();
}
@Override
public Status process() throws EventDeliveryException {
logger.info("sink开始执行");
Channel channel = getChannel();
Transaction transaction = channel.getTransaction();
transaction.begin();
try {
//从channel中拿到event
Event event = channel.take();
if (event == null) {
transaction.rollback();
return Status.BACKOFF;
}
// 解析记录 获取事件内容
String recourd = new String(event.getBody());
// 发送数据到kafka
try {
//调用kafka的消息推送,将数据推送到kafka
StringProducer.producer(kafkatopics[0],recourd);
/* if(listKeyedMessage.size()>1000){
logger.info("数据大与10000,推送数据到kafka");
sendListKeyedMessage();
logger.info("数据大与10000,推送数据到kafka成功");
}else if(System.currentTimeMillis()-proTimestamp>=60*1000){
logger.info("时间间隔大与60,推送数据到kafka");
sendListKeyedMessage();
logger.info("时间间隔大与60,推送数据到kafka成功"+listKeyedMessage.size());
}*/
} catch (Exception e) {
logger.error("推送数据到kafka失败" , e);
throw Throwables.propagate(e);
}
transaction.commit();
return Status.READY;
} catch (ChannelException e) {
logger.error(e);
transaction.rollback();
return Status.BACKOFF;
} finally {
if(transaction != null){
transaction.close();
}
}
}
@Override
public synchronized void stop() {
super.stop();
}
/*private void sendListKeyedMessage(){
Producer<String, String> producer = new Producer<>(KafkaConfig.getInstance().getProducerConfig());
producer.send(listKeyedMessage);
listKeyedMessage.clear();
proTimestamp=System.currentTimeMillis();
producer.close();
}*/
}
8、service
DataCheck.java—数据校验
package com.hsiehchou.flume.service;
import com.alibaba.fastjson.JSON;
import com.hsiehchou.common.net.HttpRequest;
import com.hsiehchou.common.project.datatype.DataTypeProperties;
import com.hsiehchou.common.time.TimeTranstationUtils;
import com.hsiehchou.flume.fields.ErrorMapFields;
import com.hsiehchou.flume.fields.MapFields;
import org.apache.log4j.Logger;
import java.util.*;
/**
* 数据校验
*/
public class DataCheck {
private final static Logger LOG = Logger.getLogger(DataCheck.class);
/**
* 获取数据类型对应的字段 对应的文件
* 结构为 [ 数据类型1 = [字段1,字段2。。。。],
* 数据类型2 = [字段1,字段2。。。。]]
*/
private static Map<String, ArrayList<String>> dataMap = DataTypeProperties.dataTypeMap;
/**
* 数据解析
* @param line
* @param fileName
* @param absoluteFilename
* @return
*/
public static Map<String, String> txtParse(String line, String fileName, String absoluteFilename) {
Map<String, String> map = new HashMap<String, String>();
String[] fileNames = fileName.split("_");
String dataType = fileNames[0];
if (dataMap.containsKey(dataType)) {
List<String> fields = dataMap.get(dataType.toLowerCase());
String[] splits = line.split("\t");
//长度校验
if (fields.size() == splits.length) {
//添加公共字段
map.put(MapFields.ID, UUID.randomUUID().toString().replace("-", ""));
map.put(MapFields.TABLE, dataType.toLowerCase());
map.put(MapFields.RKSJ, (System.currentTimeMillis() / 1000) + "");
map.put(MapFields.FILENAME, fileName);
map.put(MapFields.ABSOLUTE_FILENAME, absoluteFilename);
for (int i = 0; i < splits.length; i++) {
map.put(fields.get(i), splits[i]);
}
} else {
map = null;
LOG.error("字段长度不匹配fields"+fields.size() + "/t" + splits.length);
}
} else {
map = null;
LOG.error("配置文件中不存在此数据类型");
}
return map;
}
/**
* 数据长度校验添加必要字段并转map,将长度不符合的插入ES数据库
* @param line
* @param fileName
* @param absoluteFilename
* @return
*/
public static Map<String, String> txtParseAndalidation(String line, String fileName, String absoluteFilename) {
Map<String, String> map = new HashMap<String, String>();
Map<String, Object> errorMap = new HashMap<String, Object>();
//文件名按"_"切分 wechat_source1_1111142.txt
//wechat 数据类型
//source1 数据来源
//1111142 不让文件名相同
String[] fileNames = fileName.split("_");
String dataType = fileNames[0];
String source = fileNames[1];
if (dataMap.containsKey(dataType)) {
//获取数据类型字段
// imei,imsi,longitude,latitude,phone_mac,device_mac,device_number,collect_time,username,phone,object_username,send_message,accept_message,message_time
//根据数据类型,获取改类型的字段
List<String> fields = dataMap.get(dataType.toLowerCase());
//line
String[] splits = line.split("\t");
//长度校验
if (fields.size() == splits.length) {
for (int i = 0; i < splits.length; i++) {
map.put(fields.get(i), splits[i]);
}
//添加公共字段
// map.put(SOURCE, source);
map.put(MapFields.ID, UUID.randomUUID().toString().replace("-", ""));
map.put(MapFields.TABLE, dataType.toLowerCase());
map.put(MapFields.RKSJ, (System.currentTimeMillis() / 1000) + "");
map.put(MapFields.FILENAME, fileName);
map.put(MapFields.ABSOLUTE_FILENAME, absoluteFilename);
//数据封装完成 开始进行数据校验
errorMap = DataValidation.dataValidation(map);
} else {
errorMap.put(ErrorMapFields.LENGTH, "字段数不匹配 实际" + fields.size() + "\t" + "结果" + splits.length);
errorMap.put(ErrorMapFields.LENGTH_ERROR, ErrorMapFields.LENGTH_ERROR_NUM);
LOG.info("字段数不匹配 实际" + fields.size() + "\t" + "结果" + splits.length);
map = null;
}
//判断数据是否存在错误
if (null != errorMap && errorMap.size() > 0) {
LOG.info("errorMap===" + errorMap);
if ("1".equals("1")) {
//addErrorMapES(errorMap, map, fileName, absoluteFilename);
//验证没通过,将错误数据写到ES,并将map置空
addErrorMapESByHTTP(errorMap, map, fileName, absoluteFilename);
}
map = null;
}
} else {
map = null;
LOG.error("配置文件中不存在此数据类型");
}
return map;
}
/**
* 将错误信息写入ES,方便查错
* @param errorMap
* @param map
* @param fileName
* @param absoluteFilename
*/
public static void addErrorMapESByHTTP(Map<String, Object> errorMap, Map<String, String> map, String fileName, String absoluteFilename) {
String errorType = fileName.split("_")[0];
errorMap.put(MapFields.TABLE, errorType);
errorMap.put(MapFields.ID, UUID.randomUUID().toString().replace("-", ""));
errorMap.put(ErrorMapFields.RECORD, map);
errorMap.put(ErrorMapFields.FILENAME, fileName);
errorMap.put(ErrorMapFields.ABSOLUTE_FILENAME, absoluteFilename);
errorMap.put(ErrorMapFields.RKSJ, TimeTranstationUtils.Date2yyyy_MM_dd_HH_mm_ss());
String url="http://192.168.116.201:9200/error_recourd/error_recourd/"+ errorMap.get(MapFields.ID).toString();
String json = JSON.toJSONString(errorMap);
HttpRequest.sendPost(url,json);
//HttpRequest.sendPostMessage(url, errorMap);
}
/*
public static void addErrorMapES(Map<String, Object> errorMap, Map<String, String> map, String fileName, String absoluteFilename) {
String errorType = fileName.split("_")[0];
errorMap.put(MapFields.TABLE, errorType);
errorMap.put(MapFields.ID, UUID.randomUUID().toString().replace("-", ""));
errorMap.put(ErrorMapFields.RECORD, map);
errorMap.put(ErrorMapFields.FILENAME, fileName);
errorMap.put(ErrorMapFields.ABSOLUTE_FILENAME, absoluteFilename);
errorMap.put(ErrorMapFields.RKSJ, TimeTranstationUtils.Date2yyyy_MM_dd_HH_mm_ss());
TransportClient client = null;
try {
LOG.info("开始获取客户端===============================" + errorMap);
client = ESClientUtils.getClient();
} catch (Throwable t) {
if (t instanceof Error) {
throw (Error)t;
}
LOG.error(null,t);
}
//JestClient jestClient = JestService.getJestClient();
//boolean bool = JestService.indexOne(jestClient,TxtConstant.ERROR_INDEX, TxtConstant.ERROR_TYPE,errorMap.get(MapFields.ID).toString(),errorMap);
LOG.info("开始写入错误数据到ES===============================" + errorMap);
boolean bool = IndexUtil.putIndexData(TxtConstant.ERROR_INDEX, TxtConstant.ERROR_TYPE, errorMap.get(MapFields.ID).toString(), errorMap,client);
if(bool){
LOG.info("写入错误数据到ES===============================" + errorMap);
}else{
LOG.info("写入错误数据到ES===============================失败");
}
}*/
public static void main(String[] args) {
}
}
DataValidation.java
package com.hsiehchou.flume.service;
import com.hsiehchou.flume.fields.ErrorMapFields;
import com.hsiehchou.flume.fields.MapFields;
import com.hsiehchou.flume.utils.Validation;
import org.apache.commons.lang.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class DataValidation {
private static final Logger LOG = LoggerFactory.getLogger(DataValidation.class);
// private static final TxtConfigurationFileReader reader = TxtConfigurationFileReader.getInstance();
// private static final DataTypeConfigurationFileReader datatypereader = DataTypeConfigurationFileReader.getInstance();
// private static final ValidationConfigurationFileReader readerValidation = ValidationConfigurationFileReader.getInstance();
private static Map<String,String> dataTypeMap;
private static List<String> listAuthType;
private static String isErrorES;
private static final String USERNAME=ErrorMapFields.USERNAME;
private static final String DATA_TYPE=ErrorMapFields.DATA_TYPE;
private static final String DATA_TYPE_ERROR=ErrorMapFields.DATA_TYPE_ERROR;
private static final String DATA_TYPE_ERRORCODE=ErrorMapFields.DATA_TYPE_ERRORCODE;
private static final String SJHM=ErrorMapFields.SJHM;
private static final String SJHM_ERROR=ErrorMapFields.SJHM_ERROR;
private static final String SJHM_ERRORCODE=ErrorMapFields.SJHM_ERRORCODE;
private static final String QQ=ErrorMapFields.QQ;
private static final String QQ_ERROR=ErrorMapFields.QQ_ERROR;
private static final String QQ_ERRORCODE=ErrorMapFields.QQ_ERRORCODE;
private static final String IMSI=ErrorMapFields.IMSI;
private static final String IMSI_ERROR=ErrorMapFields.IMSI_ERROR;
private static final String IMSI_ERRORCODE=ErrorMapFields.IMSI_ERRORCODE;
private static final String IMEI=ErrorMapFields.IMEI;
private static final String IMEI_ERROR=ErrorMapFields.IMEI_ERROR;
private static final String IMEI_ERRORCODE=ErrorMapFields.IMEI_ERRORCODE;
private static final String MAC=ErrorMapFields.MAC;
private static final String CLIENTMAC=ErrorMapFields.CLIENTMAC;
private static final String STATIONMAC=ErrorMapFields.STATIONMAC;
private static final String BSSID=ErrorMapFields.BSSID;
private static final String MAC_ERROR=ErrorMapFields.MAC_ERROR;
private static final String MAC_ERRORCODE=ErrorMapFields.MAC_ERRORCODE;
private static final String DEVICENUM=ErrorMapFields.DEVICENUM;
private static final String DEVICENUM_ERROR=ErrorMapFields.DEVICENUM_ERROR;
private static final String DEVICENUM_ERRORCODE=ErrorMapFields.DEVICENUM_ERRORCODE;
private static final String CAPTURETIME=ErrorMapFields.CAPTURETIME;
private static final String CAPTURETIME_ERROR=ErrorMapFields.CAPTURETIME_ERROR;
private static final String CAPTURETIME_ERRORCODE=ErrorMapFields.CAPTURETIME_ERRORCODE;
private static final String EMAIL=ErrorMapFields.EMAIL;
private static final String EMAIL_ERROR=ErrorMapFields.EMAIL_ERROR;
private static final String EMAIL_ERRORCODE=ErrorMapFields.EMAIL_ERRORCODE;
private static final String AUTH_TYPE=ErrorMapFields.AUTH_TYPE;
private static final String AUTH_TYPE_ERROR=ErrorMapFields.AUTH_TYPE_ERROR;
private static final String AUTH_TYPE_ERRORCODE=ErrorMapFields.AUTH_TYPE_ERRORCODE;
private static final String FIRM_CODE=ErrorMapFields.FIRM_CODE;
private static final String FIRM_CODE_ERROR=ErrorMapFields.FIRM_CODE_ERROR;
private static final String FIRM_CODE_ERRORCODE=ErrorMapFields.FIRM_CODE_ERRORCODE;
private static final String STARTTIME=ErrorMapFields.STARTTIME;
private static final String STARTTIME_ERROR=ErrorMapFields.STARTTIME_ERROR;
private static final String STARTTIME_ERRORCODE=ErrorMapFields.STARTTIME_ERRORCODE;
private static final String ENDTIME=ErrorMapFields.ENDTIME;
private static final String ENDTIME_ERROR=ErrorMapFields.ENDTIME_ERROR;
private static final String ENDTIME_ERRORCODE=ErrorMapFields.ENDTIME_ERRORCODE;
private static final String LOGINTIME=ErrorMapFields.LOGINTIME;
private static final String LOGINTIME_ERROR=ErrorMapFields.LOGINTIME_ERROR;
private static final String LOGINTIME_ERRORCODE=ErrorMapFields.LOGINTIME_ERRORCODE;
private static final String LOGOUTTIME=ErrorMapFields.LOGOUTTIME;
private static final String LOGOUTTIME_ERROR=ErrorMapFields.LOGOUTTIME_ERROR;
private static final String LOGOUTTIME_ERRORCODE=ErrorMapFields.LOGOUTTIME_ERRORCODE;
public static Map<String, Object> dataValidation( Map<String, String> map){
if(map == null){
return null;
}
Map<String,Object> errorMap = new HashMap<String,Object>();
//验证手机号码
sjhmValidation(map,errorMap);
//验证MAC
macValidation(map,errorMap);
//验证经纬度
longlaitValidation(map,errorMap);
//定义自己的清洗规则
//TODO 大小写统一
//TODO 时间类型统一
//TODO 数据字段统一
//TODO 业务字段转换
//TODO 数据矫正
//TODO 验证MAC不能为空
//TODO 验证IMSI不能为空
//TODO 验证 QQ IMSI IMEI
//TODO 验证DEVICENUM是否为空 为空返回错误
/*devicenumValidation(map,errorMap);
//TODO 验证CAPTURETIME是否为空 为空过滤 不为10,14位数字过滤
capturetimeValidation(map,errorMap);
//TODO 验证EMAIL
emailValidation(map,errorMap);
//TODO 验证STARTTIME ENDTIME LOGINTIME LOGOUTTIME
timeValidation(map,errorMap);
*/
return errorMap;
}
/**
* 手机号码验证
* @param map
* @param errorMap
*/
public static void sjhmValidation(Map<String, String> map,Map<String,Object> errorMap){
if(map.containsKey("phone")){
String sjhm=map.get("phone");
//调用正则做手机号码验证,是否是正确的一个,检验
boolean ismobile = Validation.isMobile(sjhm);
if(!ismobile){
errorMap.put(SJHM,sjhm);
errorMap.put(SJHM_ERROR,SJHM_ERRORCODE);
}
}
}
//TODO QQ验证 10002 QQ编码 1030001 需要根据DATATYPE来判断数据类型的一起验证
public static void virtualValidation(String dataType, Map<String, String> map,Map<String,Object> errorMap){
//TODO USERNAME验证 10023 长度》=2
if(map.containsKey(ErrorMapFields.USERNAME)){
String username=map.get(ErrorMapFields.USERNAME);
if(StringUtils.isNotBlank(username)){
if(username.length()<2){
errorMap.put(ErrorMapFields.USERNAME,username);
errorMap.put(ErrorMapFields.USERNAME_ERROR,ErrorMapFields.USERNAME_ERRORCODE);
}
}
}
//TODO QQ验证 10002 QQ编码 1030001
if("1030001".equals(dataType)&& map.containsKey(USERNAME)){
String qqnum= map.get(USERNAME);
boolean bool = Validation.isQQ(qqnum);
if(!bool){
errorMap.put(QQ,qqnum);
errorMap.put(QQ_ERROR,QQ_ERRORCODE);
}
}
//TODO IMSI验证 10005 IMSI编码 1429997
if("1429997".equals(dataType)&& map.containsKey(IMSI)){
String imsi= map.get(IMSI);
boolean bool = Validation.isIMSI(imsi);
if(!bool){
errorMap.put(IMSI,imsi);
errorMap.put(IMSI_ERROR,IMSI_ERRORCODE);
}
}
//TODO IMEI验证 10006 IMEI编码 1429998
if("1429998".equals(dataType)&& map.containsKey(IMEI)){
String imei= map.get(IMEI);
boolean bool = Validation.isIMEI(imei);
if(!bool){
errorMap.put(IMEI,imei);
errorMap.put(IMEI_ERROR,IMEI_ERRORCODE);
}
}
}
//MAC验证 10003
public static void macValidation( Map<String, String> map,Map<String,Object> errorMap){
if(map == null){
return ;
}
if(map.containsKey("phone_mac")){
String mac=map.get("phone_mac");
if(StringUtils.isNotBlank(mac)){
boolean bool = Validation.isMac(mac);
if(!bool){
LOG.info("MAC验证失败");
errorMap.put(MAC,mac);
errorMap.put(MAC_ERROR,MAC_ERRORCODE);
}
}else{
LOG.info("MAC验证失败");
errorMap.put(MAC,mac);
errorMap.put(MAC_ERROR,MAC_ERRORCODE);
}
}
}
/**
* TODO DEVICENUM 验证 为空过滤
* @param map
* @param errorMap
*/
public static void devicenumValidation( Map<String, String> map,Map<String,Object> errorMap){
if(map == null){
return ;
}
if(map.containsKey("device_number")){
String devicenum=map.get("device_number");
if(StringUtils.isBlank(devicenum)){
errorMap.put(DEVICENUM,"设备编码不能为空");
errorMap.put(DEVICENUM_ERROR,DEVICENUM_ERRORCODE);
}
}
}
/**
* TODO CAPTURETIME验证 为空过滤 10019 验证时间长度为10或14位
* @param map
* @param errorMap
*/
public static void capturetimeValidation( Map<String, String> map,Map<String,Object> errorMap){
if(map == null){
return ;
}
if(map.containsKey(CAPTURETIME)){
String capturetime=map.get(CAPTURETIME);
if(StringUtils.isBlank(capturetime)){
errorMap.put(CAPTURETIME,"CAPTURETIME不能为空");
errorMap.put(CAPTURETIME_ERROR,CAPTURETIME_ERRORCODE);
}else{
boolean bool = Validation.isCAPTURETIME(capturetime);
if(!bool){
errorMap.put(CAPTURETIME,capturetime);
errorMap.put(CAPTURETIME_ERROR,CAPTURETIME_ERRORCODE);
}
}
}
}
//TODO EMAIL验证 为空过滤 为错误过滤 10004 通过TABLE取USERNAME验证
public static void emailValidation( Map<String, String> map,Map<String,Object> errorMap){
if(map == null){
return ;
}
if(map.get("TABLE").equals(EMAIL)){
String email=map.get(USERNAME);
if(StringUtils.isNotBlank(email)){
boolean bool = Validation.isEmail(email);
if(!bool){
errorMap.put(EMAIL,email);
errorMap.put(EMAIL_ERROR,EMAIL_ERRORCODE);
}
}else{
errorMap.put(EMAIL,"EMAIL不能为空");
errorMap.put(EMAIL_ERROR,EMAIL_ERRORCODE);
}
}
}
//TODO EMAIL验证 为空过滤 为错误过滤 10004 通过TABLE取USERNAME验证
public static void timeValidation( Map<String, String> map,Map<String,Object> errorMap){
if(map == null){
return ;
}
if(map.containsKey(STARTTIME)&&map.containsKey(ENDTIME)){
String starttime=map.get(STARTTIME);
String endtime=map.get(ENDTIME);
if(StringUtils.isBlank(starttime)&&StringUtils.isBlank(endtime)){
errorMap.put(STARTTIME,"STARTTIME和ENDTIME不能同时为空");
errorMap.put(STARTTIME_ERROR,STARTTIME_ERRORCODE);
errorMap.put(ENDTIME,"STARTTIME和ENDTIME不能同时为空");
errorMap.put(ENDTIME_ERROR,ENDTIME_ERRORCODE);
}else{
Boolean bool1 = istime(starttime, STARTTIME, STARTTIME_ERROR, STARTTIME_ERRORCODE, errorMap);
Boolean bool2 = istime(endtime, ENDTIME, ENDTIME_ERROR, ENDTIME_ERRORCODE, errorMap);
if(bool1&&bool2&&(starttime.length()!=endtime.length())){
errorMap.put(STARTTIME,"STARTTIME和ENDTIME长度不等 STARTTIME:"+starttime + "\t"+"ENDTIME:" + endtime);
errorMap.put(STARTTIME_ERROR,STARTTIME_ERRORCODE);
errorMap.put(ENDTIME,"STARTTIME和ENDTIME长度不等 STARTTIME:"+starttime + "\t"+"ENDTIME:" + endtime);
errorMap.put(ENDTIME_ERROR,ENDTIME_ERRORCODE);
}
else if(bool1&&bool2&&(endtime.compareTo(starttime)<0)){
errorMap.put(STARTTIME,"ENDTIME必须大于STARTTIME STARTTIME:"+starttime + "\t"+"ENDTIME:" + endtime);
errorMap.put(STARTTIME_ERROR,STARTTIME_ERRORCODE);
errorMap.put(ENDTIME,"ENDTIME必须大于STARTTIME STARTTIME:"+starttime + "\t"+"ENDTIME:" + endtime);
errorMap.put(ENDTIME_ERROR,ENDTIME_ERRORCODE);
}
}
}else if(map.containsKey(LOGINTIME)&&map.containsKey(LOGOUTTIME)){
String logintime=map.get(LOGINTIME);
String logouttime=map.get(LOGOUTTIME);
if(StringUtils.isBlank(logintime)&&StringUtils.isBlank(logouttime)){
errorMap.put(LOGINTIME,"LOGINTIME和LOGOUTTIME不能同时为空");
errorMap.put(LOGINTIME_ERROR,LOGINTIME_ERRORCODE);
errorMap.put(LOGOUTTIME,"LOGINTIME和LOGOUTTIME不能同时为空");
errorMap.put(LOGOUTTIME_ERROR,LOGOUTTIME_ERRORCODE);
}else{
Boolean bool1 = istime(logintime, LOGINTIME, LOGINTIME_ERROR, LOGINTIME_ERRORCODE, errorMap);
Boolean bool2 = istime(logouttime, LOGOUTTIME, LOGOUTTIME_ERROR, LOGOUTTIME_ERRORCODE, errorMap);
if(bool1&&bool2&&(logintime.length()!=logouttime.length())){
errorMap.put(LOGINTIME,"LOGOUTTIME LOGINTIME长度不等 LOGINTIME:"+logintime + "\t"+"LOGOUTTIME:" + logouttime);
errorMap.put(LOGINTIME_ERROR,LOGINTIME_ERRORCODE);
errorMap.put(LOGOUTTIME,"LOGOUTTIME LOGINTIME长度不等 LOGINTIME:"+logintime + "\t"+"LOGOUTTIME:" + logouttime);
errorMap.put(LOGOUTTIME_ERROR,LOGOUTTIME_ERRORCODE);
}
else if(bool1&&bool2&&(logouttime.compareTo(logintime)<0)){
errorMap.put(LOGINTIME,"LOGOUTTIME必须大于LOGINTIME LOGINTIME:"+logintime + "\t"+"LOGOUTTIME:" + logouttime);
errorMap.put(LOGINTIME_ERROR,LOGINTIME_ERRORCODE);
errorMap.put(LOGOUTTIME,"LOGOUTTIME必须大于LOGINTIME LOGINTIME:"+logintime + "\t"+"LOGOUTTIME:" + logouttime);
errorMap.put(LOGOUTTIME_ERROR,LOGOUTTIME_ERRORCODE);
}
}
}
}
//TODO AUTH_TYPE验证 为空过滤 为错误过滤 10020
public static void authtypeValidation( Map<String, String> map,Map<String,Object> errorMap){
if(map == null){
return ;
}
String fileName=map.get(MapFields.FILENAME);
if(fileName.split("_").length<=2){
map = null;
return;
}
if(StringUtils.isNotBlank(fileName)){
if("bh".equals(fileName.split("_")[2])||"wy".equals(fileName.split("_")[2])||"yc".equals(fileName.split("_")[2])){
return ;
}else if(map.containsKey(AUTH_TYPE)){
String authtype=map.get(AUTH_TYPE);
if(StringUtils.isNotBlank(authtype)){
if(listAuthType.contains(authtype)){
if("1020004".equals(authtype)){
String sjhm=map.get(MapFields.AUTH_ACCOUNT);
boolean ismobile = Validation.isMobile(sjhm);
if(!ismobile){
errorMap.put(SJHM,sjhm);
errorMap.put(SJHM_ERROR,SJHM_ERRORCODE);
}
}
if("1020002".equals(authtype)){
String mac=map.get(MapFields.AUTH_ACCOUNT);
boolean ismac = Validation.isMac(mac);
if(!ismac){
errorMap.put(MAC,mac);
errorMap.put(MAC_ERROR,MAC_ERRORCODE);
}
}
}else{
errorMap.put(AUTH_TYPE,"AUTHTYPE_LIST 影射里没有"+ "\t"+ "["+ authtype+"]");
errorMap.put(AUTH_TYPE_ERROR,AUTH_TYPE_ERRORCODE);
}
}else{
errorMap.put(AUTH_TYPE,"AUTH_TYPE 不能为空");
errorMap.put(AUTH_TYPE_ERROR,AUTH_TYPE_ERRORCODE);
}
}
}
}
private static final String LONGITUDE = "longitude";
private static final String LATITUDE = "latitude";
private static final String LONGITUDE_ERROR=ErrorMapFields.LONGITUDE_ERROR;
private static final String LONGITUDE_ERRORCODE=ErrorMapFields.LONGITUDE_ERRORCODE;
private static final String LATITUDE_ERROR=ErrorMapFields.LATITUDE_ERROR;
private static final String LATITUDE_ERRORCODE=ErrorMapFields.LATITUDE_ERRORCODE;
/**
* 经纬度验证 错误过滤 10012 10013
* @param map
* @param errorMap
*/
public static void longlaitValidation( Map<String, String> map,Map<String,Object> errorMap){
if(map == null){
return ;
}
if(map.containsKey(LONGITUDE)&&map.containsKey(LATITUDE)){
String longitude=map.get(LONGITUDE);
String latitude=map.get(LATITUDE);
boolean bool1 = Validation.isLONGITUDE(longitude);
boolean bool2 = Validation.isLATITUDE(latitude);
if(!bool1){
errorMap.put(LONGITUDE,longitude);
errorMap.put(LONGITUDE_ERROR,LONGITUDE_ERRORCODE);
}
if(!bool2){
errorMap.put(LATITUDE,latitude);
errorMap.put(LATITUDE_ERROR,LATITUDE_ERRORCODE);
}
}
}
public static Boolean istime(String time,String str1,String str2,String str3,Map<String,Object> errorMap){
if(StringUtils.isNotBlank(time)){
boolean bool = Validation.isCAPTURETIME(time);
if(!bool){
errorMap.put(str1,time);
errorMap.put(str2,str3);
return false;
}
return true;
}
return false;
}
}
9、配置CDH上的Agent文件—-跟FolderSource等里面读取配置文件相对应
Flume配置:
tier1.sources= source1
tier1.channels=channel1
tier1.sinks=sink1
#定义source1
tier1.sources.source1.type = com.hsiehchou.flume.source.FolderSource
#读取文件之后睡眠时间
tier1.sources.source1.sleeptime=5
tier1.sources.source1.filenum=3000
tier1.sources.source1.dirs =/usr/chl/data/filedir/
tier1.sources.source1.successfile=/usr/chl/data/filedir_successful/
tier1.sources.source1.deserializer.outputCharset=UTF-8
tier1.sources.source1.channels = channel1
# 定义拦截器1
tier1.sources.source1.interceptors=i1
tier1.sources.source1.interceptors.i1.type=com.hsiehchou.flume.interceptor.DataCleanInterceptor$Builder
#定义channel
tier1.channels.channel1.type = memory
tier1.channels.channel1.keep-alive= 300
tier1.channels.channel1.capacity = 1000000
tier1.channels.channel1.transactionCapacity = 5000
tier1.channels.channel1.byteCapacityBufferPercentage = 200
tier1.channels.channel1.byteCapacity = 80000
#定义sink1
tier1.sinks.sink1.type = com.hsiehchou.flume.sink.KafkaSink
tier1.sinks.sink1.kafkatopics = chl_test7
tier1.sinks.sink1.channel = channel1
flumesource 不断监控 ftp 文件目录,通过自定义拦截器拦截,然后推送到flumechannel,然后通过flumesink下沉到kafka
10、flume打包到服务器执行
不能放在默认的/usr/lib/flume-ng/plugins.d下面
mkdir -p /var/lib/flume-ng/plugins.d/chl/lib
mkdir -p /usr/chl/data/filedir/
mkdir -p /usr/chl/data/filedir_successful/
要设置777,flume启动的时候是以flume权限启动的,所以要更改权限
chmod 777 /usr/chl/data/filedir/
kafka-topics –zookeeper hadoop1:2181 –topic chl_test7 –create –replication-factor 1 –partitions 3
kafka-topics –zookeeper hadoop1:2181 –list
kafka-topics –zookeeper hadoop1:2181 –delete –topic chl_test7
kafka-console-consumer –bootstrap-server hadoop1:9092 –topic chl_test7 –from-beginning
六、Kafka开发
xz_bigdata_kafka
1、pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>xz_bigdata2</artifactId>
<groupId>com.hsiehchou</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>xz_bigdata_kafka</artifactId>
<name>xz_bigdata_kafka</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<poi.version>3.14</poi.version>
<kafka.version>0.9.0-kafka-2.0.2</kafka.version>
<mysql.connector.version>5.1.46</mysql.connector.version>
</properties>
<dependencies>
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_resources</artifactId>
<version>1.0-SNAPSHOT</version>
<optional>true</optional>
</dependency>
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_common</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
<version>${zookeeper.version}-${cdh.version}</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>${kafka.version}</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>${poi.version}</version>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<artifactId>scala-reflect</artifactId>
<groupId>org.scala-lang</groupId>
<version>${scala.version}</version>
</dependency>
</dependencies>
</project>
2、config/KafkaConfig.java—kafka配置文件 解析器
package com.hsiehchou.kafka.config;
import com.hsiehchou.common.config.ConfigUtil;
import kafka.producer.ProducerConfig;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Properties;
/**
* kafka配置文件 解析器
*/
public class KafkaConfig {
private static final Logger LOG = LoggerFactory.getLogger(KafkaConfig.class);
private static final String DEFUALT_CONFIG_PATH = "kafka/kafka-server-config.properties";
private volatile static KafkaConfig kafkaConfig = null;
private ProducerConfig config;
private Properties properties;
private KafkaConfig() throws IOException{
try {
properties = ConfigUtil.getInstance().getProperties(DEFUALT_CONFIG_PATH);
} catch (Exception e) {
IOException ioException = new IOException();
ioException.addSuppressed(e);
throw ioException;
}
config = new ProducerConfig(properties);
}
public static KafkaConfig getInstance(){
if(kafkaConfig == null){
synchronized (KafkaConfig.class) {
if(kafkaConfig == null){
try {
kafkaConfig = new KafkaConfig();
} catch (IOException e) {
LOG.error("实例化kafkaConfig失败", e);
}
}
}
}
return kafkaConfig;
}
public ProducerConfig getProducerConfig(){
return config;
}
/**
* 获取当前时间的字符串 格式为:yyyy-MM-dd HH:mm:ss
* @return String
*/
public static String nowStr(){
return new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format( new Date() );
}
}
3、producer/StringProducer.java—生产者
package com.hsiehchou.kafka.producer;
import com.hsiehchou.common.thread.ThreadPoolManager;
import com.hsiehchou.kafka.config.KafkaConfig;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
public class StringProducer {
private static final Logger LOG = LoggerFactory.getLogger(StringProducer.class);
public static void main(String[] args) {
StringProducer.producer("chl_test2","{\"rksj\":\"1558177156\",\"latitude\":\"24.000000\",\"imsi\":\"000000000000000\",\"accept_message\":\"\",\"phone_mac\":\"aa-aa-aa-aa-aa-aa\",\"device_mac\":\"bb-bb-bb-bb-bb-bb\",\"message_time\":\"1789098762\",\"filename\":\"wechat_source1_1111119.txt\",\"absolute_filename\":\"/usr/chl/data/filedir_successful/2019-05-18/data/filedir/wechat_source1_1111119.txt\",\"phone\":\"18609765432\",\"device_number\":\"32109231\",\"imei\":\"000000000000000\",\"id\":\"1792d6529e2143fa85717e706403c83c\",\"collect_time\":\"1557305988\",\"send_message\":\"\",\"table\":\"wechat\",\"object_username\":\"judy\",\"longitude\":\"23.000000\",\"username\":\"andiy\"}");
}
private static int threadSize = 6;
/**
* 生产单条消息 单条推送
* @param topic
* @param recourd
*/
public static void producer(String topic,String recourd){
Producer<String, String> producer = new Producer<>(KafkaConfig.getInstance().getProducerConfig());
KeyedMessage<String, String> keyedMessage = new KeyedMessage<>(topic, recourd);
producer.send(keyedMessage);
LOG.info("发送数据"+recourd+"到kafka成功");
producer.close();
}
/**
* 批量推送
* @param topic
* @param listRecourd
*/
public static void producerList(String topic,List<String> listRecourd){
Producer<String, String> producer = new Producer<>(KafkaConfig.getInstance().getProducerConfig());
List<KeyedMessage<String, String>> listKeyedMessage= new ArrayList<>();
listRecourd.forEach(recourd->{
listKeyedMessage.add(new KeyedMessage<>(topic, recourd));
});
producer.send(listKeyedMessage);
producer.close();
}
/**
* 多线程推送
* @param topic kafka topic
* @param listMessage 消息
* @throws Exception
*/
public void producer(String topic,List<String> listMessage) throws Exception{
// int size = listMessage.size();
List<List<String>> lists = splitList(listMessage, 5);
int threadNum = lists.size();
long t1 = System.currentTimeMillis();
CountDownLatch cdl = new CountDownLatch(threadNum);
//使用线程池
ExecutorService executorService = ThreadPoolManager.getInstance().getExecutorService();
LOG.info("开启 " + threadNum + " 个线程来向 topic " + topic + " 生产数据 . ");
for (int i = 0; i < threadNum; i++) {
try {
executorService.execute(new ProducerTask(topic,lists.get(i)));
} catch (Exception e) {
LOG.error("", e);
}
}
cdl.await();
long t = System.currentTimeMillis() - t1;
LOG.info( " 一共耗时 :" + t + " 毫秒 ... " );
executorService.shutdown();
}
/**
* 拆分消息集合,计算使用多少个线程执行运算
* @param mtList
*/
public static List<List<String>> splitList(List<String> mtList, int splitSize){
if(mtList == null || mtList.size()==0){
return null;
}
int length = mtList.size();
// 计算可以分成多少组
int num = ( length + splitSize - 1 )/splitSize ;
List<List<String>> spiltList = new ArrayList<>(num);
for (int i = 0; i < num; i++) {
// 开始位置
int fromIndex = i * splitSize;
// 结束位置
int toIndex = (i+1) * splitSize < length ? ( i+1 ) * splitSize : length ;
spiltList.add(mtList.subList(fromIndex,toIndex)) ;
}
return spiltList;
}
class ProducerTask implements Runnable{
private String topic;
private List<String> listRecourd;
public ProducerTask( String topic, List<String> listRecourd){
this.topic = topic;
this.listRecourd = listRecourd;
}
public void run() {
producerList(topic,listRecourd);
}
}
/* public static void producer(String topic,List<KeyedMessage<String,String>> listMessage) throws Exception{
int size = listMessage.size();
int threads = ( ( size - 1 ) / threadSize ) + 1;
long t1 = System.currentTimeMillis();
CountDownLatch cdl = new CountDownLatch(threads);
//使用线程池
ExecutorService executorService = ThreadPoolManager.getInstance().getExecutorService();
LOG.info("开启 " + threads + " 个线程来向 topic " + topic + " 生产数据 . ");
*//* for( int i = 0 ; i < threads ; i++ ){
executorService.execute( new StringProducer.ChildProducer( start , end , topic , id, cdl ));
}*//*
cdl.await();
long t = System.currentTimeMillis() - t1;
LOG.info( " 一共耗时 :" + t + " 毫秒 ... " );
executorService.shutdown();
}
static class ChildProducer implements Runnable{
public ChildProducer( int start , int end , String topic , String id, CountDownLatch cdl ){
}
public void run() {
}
}
*/
}
七、Spark—kafka2es开发
Cloudera查找对应的maven依赖地址
SparkStreaming+Kafka的两种模式receiver模式和Direct模式
Sparkstreming + kafka receiver模式理解
receiver模式理解
在SparkStreaming程序运行起来后,Executor中会有receiver tasks接收kafka推送过来的数据。数据会被持久化,默认级别为MEMORY_AND_DISK_SER_2,这个级别也可以修改。receiver task对接收过来的数据进行存储和备份,这个过程会有节点之间的数据传输。备份完成后去zookeeper中更新消费偏移量,然后向Driver中的receiver tracker汇报数据的位置。最后Driver根据数据本地化将task分发到不同节点上执行。
receiver模式中存在的问题
当Driver进程挂掉后,Driver下的Executor都会被杀掉,当更新完zookeeper消费偏移量的时候,Driver如果挂掉了,就会存在找不到数据的问题,相当于丢失数据。
如何解决这个问题?
开启WAL(write ahead log)预写日志机制,在接受过来数据备份到其他节点的时候,同时备份到HDFS上一份(我们需要将接收来的数据的持久化级别降级到MEMORY_AND_DISK),这样就能保证数据的安全性。不过,因为写HDFS比较消耗性能,要在备份完数据之后才能进行更新zookeeper以及汇报位置等,这样会增加job的执行时间,这样对于任务的执行提高了延迟度。
注意
1)开启WAL之后,接受数据级别要降级,有效率问题
2)开启WAL要checkpoint
3)开启WAL(write ahead log),往HDFS中备份一份数据
Sparkstreming + kafka receiver模式理解
- 简化数据处理流程
- 自己定义offset存储,保证数据0丢失,但是会存在重复消费问题。(解决消费等幂问题)
- 不用接收数据,自己去kafka中拉取
1、spark下的pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>xz_bigdata2</artifactId>
<groupId>com.hsiehchou</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>xz_bigdata_spark</artifactId>
<name>xz_bigdata_spark</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<spark.version>1.6.0</spark.version>
</properties>
<dependencies>
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_common</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_resources</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_es</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_redis</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_hbase</artifactId>
<version>1.0-SNAPSHOT</version>
<exclusions>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
<exclusion>
<artifactId>gson</artifactId>
<groupId>com.google.code.gson</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>${spark.version}-${cdh.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>${spark.version}-${cdh.version}</version>
<exclusions>
<exclusion>
<artifactId>httpcore</artifactId>
<groupId>org.apache.httpcomponents</groupId>
</exclusion>
<exclusion>
<artifactId>httpclient</artifactId>
<groupId>org.apache.httpcomponents</groupId>
</exclusion>
<exclusion>
<artifactId>slf4j-api</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>gson</artifactId>
<groupId>com.google.code.gson</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>${spark.version}-${cdh.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>${spark.version}-${cdh.version}</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-13_2.10</artifactId>
<version>6.2.3</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.15.2</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin><!--打包依赖的jar包-->
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<configuration>
<outputDirectory>${project.build.directory}/lib</outputDirectory>
<excludeTransitive>false</excludeTransitive> <!-- 表示是否不包含间接依赖的包 -->
<stripVersion>false</stripVersion> <!-- 去除版本信息 -->
</configuration>
<executions>
<execution>
<id>copy-dependencies</id>
<phase>package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<!-- 拷贝项目依赖包到lib/目录下 -->
<outputDirectory>${project.build.directory}/jars</outputDirectory>
<excludeTransitive>false</excludeTransitive>
<stripVersion>false</stripVersion>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
2、spark中的文件结构
点击”+”号,选择Scala SDK,点击Browse,选择本地下载的scala-sdk-2.10.4
3、xz_bigdata_spark/spark/common
SparkContextFactory.scala
package com.hsiehchou.spark.common
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{Accumulator, SparkContext}
object SparkContextFactory {
def newSparkBatchContext(appName:String = "sparkBatch") : SparkContext = {
val sparkConf = SparkConfFactory.newSparkBatchConf(appName)
new SparkContext(sparkConf)
}
def newSparkLocalBatchContext(appName:String = "sparkLocalBatch" , threads : Int = 2) : SparkContext = {
val sparkConf = SparkConfFactory.newSparkLoalConf(appName, threads)
sparkConf.set("","")
new SparkContext(sparkConf)
}
def getAccumulator(appName:String = "sparkBatch") : Accumulator[Int] = {
val sparkConf = SparkConfFactory.newSparkBatchConf(appName)
val accumulator: Accumulator[Int] = new SparkContext(sparkConf).accumulator(0,"")
accumulator
}
/**
* 创建本地流streamingContext
* @param appName appName
* @param batchInterval 多少秒读取一次
* @param threads 开启多少个线程
* @return
*/
def newSparkLocalStreamingContext(appName:String = "sparkStreaming" ,
batchInterval:Long = 30L ,
threads : Int = 4) : StreamingContext = {
val sparkConf = SparkConfFactory.newSparkLocalConf(appName, threads)
// sparkConf.set("spark.streaming.receiver.maxRate","10000")
sparkConf.set("spark.streaming.kafka.maxRatePerPartition","1")
new StreamingContext(sparkConf, Seconds(batchInterval))
}
/**
* 创建集群模式streamingContext
* 这里不设置线程数,在submit中指定
* @param appName
* @param batchInterval
* @return
*/
def newSparkStreamingContext(appName:String = "sparkStreaming" , batchInterval:Long = 30L) : StreamingContext = {
val sparkConf = SparkConfFactory.newSparkStreamingConf(appName)
new StreamingContext(sparkConf, Seconds(batchInterval))
}
def startSparkStreaming(ssc:StreamingContext){
ssc.start()
ssc.awaitTermination()
ssc.stop()
}
}
convert/DataConvert.scala
package com.hsiehchou.spark.common.convert
import java.util
import com.hsiehchou.common.config.ConfigUtil
import org.apache.spark.Logging
import scala.collection.JavaConversions._
/**
* 数据类型转换
*/
object DataConvert extends Serializable with Logging {
val fieldMappingPath = "es/mapping/fieldmapping.properties"
private val typeFieldMap: util.HashMap[String, util.HashMap[String, String]] = getEsFieldtypeMap()
/**
* 将Map<String,String>转化为Map<String,Object>
*/
def strMap2esObjectMap(map:util.Map[String,String]):util.Map[String,Object] ={
//获取配置文件中的数据类型
val dataType = map.get("table")
//获取配置文件中的数据类型的 字段类型
val fieldMap = typeFieldMap.get(dataType)
//获取数据类型的所有字段,配置文件里的字段
val keySet = fieldMap.keySet()
//var objectMap:util.HashMap[String,Object] = new util.HashMap[String,Object]()
var objectMap = new java.util.HashMap[String, Object]()
//数据里的字段
val set = map.keySet().iterator()
try {
//遍历真实数据的所有字段
while (set.hasNext()) {
val key = set.next()
var dataType:String = "string"
//如果在配置文件中的key包含真实数据的key
if (keySet.contains(key)) {
//则获取真实数据字段的数据类型
dataType = fieldMap.get(key)
}
dataType match {
case "long" => objectMap = BaseDataConvert.mapString2Long(map, key, objectMap)
case "string" => objectMap = BaseDataConvert.mapString2String(map, key, objectMap)
case "double" => objectMap = BaseDataConvert.mapString2Double(map, key, objectMap)
case _ => objectMap = BaseDataConvert.mapString2String(map, key, objectMap)
}
}
}catch {
case e: Exception => logInfo("转换异常", e)
}
println("转换后" + objectMap)
objectMap
}
/**
* 读取 "es/mapping/fieldmapping.properties 配置文件
* 主要作用是将 真实数据 根据配置来作数据类型转换 转换为和ES mapping结构保持一致
* @return
*/
def getEsFieldtypeMap(): util.HashMap[String, util.HashMap[String, String]] = {
// ["wechat":["phone_mac":"string","latitude":"long"]]
//定义返回Map
val mapMap = new util.HashMap[String, util.HashMap[String, String]]
val properties = ConfigUtil.getInstance().getProperties(fieldMappingPath)
val tables = properties.get("tables").toString.split(",")
val tableFields = properties.keySet()
tables.foreach(table => {
val map = new util.HashMap[String, String]()
tableFields.foreach(tableField => {
if (tableField.toString.startsWith(table)) {
val key = tableField.toString.split("\\.")(1)
val value = properties.get(tableField).toString
map.put(key, value)
}
})
mapMap.put(table, map)
})
mapMap
}
}
4、org/apache/spark/streaming/kafka/KafkaManager.scala
构建Kafka时用到,KafkaCluster在org.apache.spark.streaming.kafka下面,而且只能在spark里面使用,这时候我们就可以新建相同的目录结构,就可以引用了,如下图所示:
package org.apache.spark.streaming.kafka
import com.alibaba.fastjson.TypeReference
import kafka.common.TopicAndPartition
import kafka.message.MessageAndMetadata
import kafka.serializer.{Decoder, StringDecoder}
import org.apache.spark.Logging
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import scala.reflect.ClassTag
/**
* 包名说明 :KafkaCluster是私有类,只能在spark包中使用,
* 所以包名保持和 KafkaCluster 一致才能调用
* @param kafkaParams
* @param autoUpdateoffset
*/
class KafkaManager(val kafkaParams:Map[String, String],
val autoUpdateoffset:Boolean =true) extends Serializable with Logging {
//构造一个KafkaCluster
@transient
private var cluster = new KafkaCluster(kafkaParams)
//定义一个单例
def kc(): KafkaCluster = {
if (cluster == null) {
cluster = new KafkaCluster(kafkaParams)
}
cluster
}
/**
* 泛型流读取器
* @param ssc
* @param topics kafka topics,多个topic按","分割
* @tparam K 泛型 K
* @tparam V 泛型 V
* @tparam KD scala泛型 KD <: Decoder[K] 说明KD 的类型必须是Decoder[K]的子类型 上下界
* @tparam VD scala泛型 VD <: Decoder[V] 说明VD 的类型必须是Decoder[V]的子类型 上下界
* @return
*/
def createDirectStream[K: ClassTag, V: ClassTag,
KD <: Decoder[K] : ClassTag,
VD <: Decoder[V] : ClassTag](ssc: StreamingContext, topics: Set[String]): InputDStream[(K, V)] = {
//获取消费者组ID
//val groupId = "test"
val groupId = kafkaParams.get("group.id").getOrElse("default")
// 在zookeeper上读取offsets前先根据实际情况更新offsets
setOrUpdateOffsets(topics, groupId)
//把所有的offsets处理完成,就可以从zookeeper上读取offset开始消费message
val messages = {
//获取kafka分区信息 为了打印信息
val partitionsE = kc.getPartitions(topics)
require(partitionsE.isRight, s"获取 kafka topic ${topics}`s partition 失败。")
val partitions = partitionsE.right.get
println("打印分区信息")
partitions.foreach(println(_))
//获取分区的offset
val consumerOffsetsE = kc.getConsumerOffsets(groupId, partitions)
require(consumerOffsetsE.isRight, s"获取 kafka topic ${topics}`s consumer offsets 失败。")
val consumerOffsets = consumerOffsetsE.right.get
println("打印消费者分区偏移信息")
consumerOffsets.foreach(println(_))
//读取数据
KafkaUtils.createDirectStream[K, V, KD, VD, (K, V)](
ssc, kafkaParams, consumerOffsets, (mmd: MessageAndMetadata[K, V]) => (mmd.key, mmd.message))
}
if (autoUpdateoffset) {
//更新offset
messages.foreachRDD(rdd => {
logInfo("RDD 消费成功,开始更新zookeeper上的偏移")
updateZKOffsets(rdd)
})
}
messages
}
/**
* 创建数据流前,根据实际消费情况更新消费offsets
* @param topics
* @param groupId
*/
private def setOrUpdateOffsets(topics: Set[String], groupId: String): Unit = {
topics.foreach(topic => {
//先获取Kafka offset信息 Kafka partions的节点信息
//获取kafka本身的偏移量, Either类型可以认为就是封装了2种信息
val partitionsE = kc.getPartitions(Set(topic))
logInfo(partitionsE + "")
//require(partitionsE.isRight, "获取partition失败")
require(partitionsE.isRight, s"获取 kafka topic ${topic}`s partition 失败。")
println("partitionsE=" + partitionsE)
val partitions = partitionsE.right.get
println("打印分区信息")
partitions.foreach(println(_))
//获取kafka partions最早的offsets
val earliestLeader = kc.getEarliestLeaderOffsets(partitions)
require(earliestLeader.isRight, "获取earliestLeader失败")
val earliestLeaderOffsets = earliestLeader.right.get
println("kafka最早的消息偏移量")
earliestLeaderOffsets.foreach(println(_))
//获取kafka最末尾的offsets
val latestLeader = kc.getLatestLeaderOffsets(partitions)
//require(latestLeader.isRight, "获取latestLeader失败")
val latestLeaderOffsets = latestLeader.right.get
println("kafka最末尾的消息偏移量")
latestLeaderOffsets.foreach(println(_))
//获取消费者的offsets
val consumerOffsetsE = kc.getConsumerOffsets(groupId, partitions)
//判断消费者是否消费过,消费者offset存在
if (consumerOffsetsE.isRight) {
/**
* 如果zk上保存的offsets已经过时了,即kafka的定时清理策略已经将包含该offsets的文件删除。
* 针对这种情况,只要判断一下zk上的consumerOffsets和earliestLeaderOffsets的大小,
* 如果consumerOffsets比earliestLeaderOffsets还小的话,说明consumerOffsets已过时,
* 这时把consumerOffsets更新为earliestLeaderOffsets
*/
//如果消费过,直接取过来的kafka消费,,earliestLeader 存在
if (earliestLeader.isRight) {
//获取到最早的offset 也就是最小的offset
require(earliestLeader.isRight, "获取earliestLeader失败")
val earliestLeaderOffsets = earliestLeader.right.get
//获取消费者组的offset
val consumerOffsets = consumerOffsetsE.right.get
// 将 consumerOffsets 和 earliestLeaderOffsets 的offsets 做比较
// 可能只是存在部分分区consumerOffsets过时,所以只更新过时分区的consumerOffsets为earliestLeaderOffsets
var offsets: Map[TopicAndPartition, Long] = Map()
consumerOffsets.foreach({ case (tp, n) =>
val earliestLeaderOffset = earliestLeaderOffsets(tp).offset
//如果消費者的偏移小于 kafka中最早的offset,那么,將最早的offset更新到zk
if (n < earliestLeaderOffset) {
logWarning("consumer group:" + groupId + ",topic:" + tp.topic + ",partition:" + tp.partition +
" offsets已经过时,更新为" + earliestLeaderOffset)
offsets += (tp -> earliestLeaderOffset)
}
})
//设置offsets
setOffsets(groupId, offsets)
}
} else {
//如果没有消费过,那么就去取kafka获取earliestLeader写到zk中
// 消费者还没有消费过 也就是zookeeper中还没有消费者的信息
if (earliestLeader.isLeft)
logError(s"${topic} hasConsumed but earliestLeaderOffsets is null。")
//看是从头消费还是从末开始消费 smallest表示从头开始消费
val reset = kafkaParams.get("auto.offset.reset").map(_.toLowerCase).getOrElse("smallest")
//往zk中去写,构建消费者 偏移
var leaderOffsets: Map[TopicAndPartition, Long] = Map.empty
//从头消费
if (reset.equals("smallest")) {
//分为 存在 和 不存在 最早的消费记录 两种情况
//如果kafka 最小偏移存在,则将消费者偏移设置为和kafka偏移一样
if (earliestLeader.isRight) {
leaderOffsets = earliestLeader.right.get.map {
case (tp, offset) => (tp, offset.offset)
}
} else {
//如果不存在,则从新构建偏移全部为0 offsets
leaderOffsets = partitions.map(tp => (tp, 0L)).toMap
}
} else {
//直接获取最新的offset
leaderOffsets = kc.getLatestLeaderOffsets(partitions).right.get.map {
case (tp, offset) => (tp, offset.offset)
}
}
//设置offsets 写到zk中
setOffsets(groupId, leaderOffsets)
}
})
}
/**
* 设置消费者组的offsets
* @param groupId
* @param offsets
*/
private def setOffsets(groupId: String, offsets: Map[TopicAndPartition, Long]): Unit = {
if (offsets.nonEmpty) {
//更新offset
val o = kc.setConsumerOffsets(groupId, offsets)
logInfo(s"更新zookeeper中消费组为:${groupId} 的 topic offset信息为: ${offsets}")
if (o.isLeft) {
logError(s"Error updating the offset to Kafka cluster: ${o.left.get}")
}
}
}
/**
* 通过spark的RDD 更新zookeeper上的消费offsets
* @param rdd
*/
def updateZKOffsets[K: ClassTag, V: ClassTag](rdd: RDD[(K, V)]) : Unit = {
//获取消费者组
val groupId = kafkaParams.get("group.id").getOrElse("default")
//spark使用kafka低阶API进行消费的时候,每个partion的offset是保存在 spark的RDD中,所以这里可以直接在
//RDD的 HasOffsetRanges 中获取倒offsets信息。因为这个信息spark不会把则个信息存储到zookeeper中,所以
//我们需要自己实现将这部分offsets信息存储到zookeeper中
val offsetsList = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
//打印出spark中保存的offsets信息
offsetsList.foreach(x=>{
println("获取spark 中的偏移信息"+x)
})
for (offsets <- offsetsList) {
//根据topic和partition 构建topicAndPartition
val topicAndPartition = TopicAndPartition(offsets.topic, offsets.partition)
logInfo("将SPARK中的 偏移信息 存到zookeeper中")
//将消费者组的offsets更新到zookeeper中
setOffsets(groupId, Map((topicAndPartition, offsets.untilOffset)))
}
}
//(null,{"rksj":"1558178497","latitude":"24.000000","imsi":"000000000000000"})
//读取kafka流,并将json数据转为map
def createJsonToJMapObjectDirectStreamWithOffset(ssc:StreamingContext, topicsSet:Set[String]): DStream[java.util.Map[String,Object]] = {
//一个转换器
val converter = {json:String =>
println(json)
var res : java.util.Map[String,Object] = null
try {
//JSON转map的操作
res = com.alibaba.fastjson.JSON.parseObject(json,
new TypeReference[java.util.Map[String, Object]]() {})
} catch {
case e: Exception => logError(s"解析topic ${topicsSet}, 的记录 ${json} 失败。", e)
}
res
}
createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
}
/**
* 根据converter创建流数据
* @param ssc
* @param topicsSet
* @param converter
* @tparam T
* @return
*/
def createDirectStreamWithOffset[T:ClassTag](ssc:StreamingContext,
topicsSet:Set[String], converter:String => T): DStream[T] = {
createDirectStream[String, String, StringDecoder, StringDecoder](ssc, topicsSet)
.map(pair =>converter(pair._2))
}
def createJsonToJMapDirectStreamWithOffset(ssc:StreamingContext,
topicsSet:Set[String]): DStream[java.util.Map[String,String]] = {
val converter = {json:String =>
var res : java.util.Map[String,String] = null
try {
res = com.alibaba.fastjson.JSON.parseObject(json, new TypeReference[java.util.Map[String, String]]() {})
} catch {
case e: Exception => logError(s"解析topic ${topicsSet}, 的记录 ${json} 失败。", e)
}
res
}
createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
}
/*
/**
* @param ssc
* @param topicsSet
* @return
*/
def createJsonToJavaBeanDirectStreamWithOffset(ssc:StreamingContext ,
topicsSet:Set[String]): DStream[Object] = {
val converter = {json:String =>
var res : Object = null
try {
res = com.alibaba.fastjson.JSON.parseObject(json, new TypeReference[Object]() {})
} catch {
case e: Exception => logError(s"解析topic ${topicsSet}, 的记录 ${json} 失败。", e)
}
res
}
createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
}
*/
/*
def createStringDirectStreamWithOffset(ssc:StreamingContext ,
topicsSet:Set[String]): DStream[String] = {
val converter = {json:String =>
json
}
createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
}
*/
/**
* 读取JSON的流 并将JSON流 转为MAP流 并且这个流支持RDD向zookeeper中记录消费信息
* @param ssc spark ssc
* @param topicsSet topic 集合 支持从多个kafka topic同时读取数据
* @return DStream[java.util.Map[String,String
*/
def createJsonToJMapStringDirectStreamWithOffset(ssc:StreamingContext , topicsSet:Set[String]): DStream[java.util.Map[String,String]] = {
val converter = {json:String =>
var res : java.util.Map[String,String] = null
try {
res = com.alibaba.fastjson.JSON.parseObject(json, new TypeReference[java.util.Map[String, String]]() {})
} catch {
case e: Exception => logError(s"解析topic ${topicsSet}, 的记录 ${json} 失败。", e)
}
res
}
createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
}
/**
* 读取JSON的流 并将JSON流 转为MAP流 并且这个流支持RDD向zookeeper中记录消费信息
* @param ssc spark ssc
* @param topicsSet topic 集合 支持从多个kafka topic同时读取数据
* @return DStream[java.util.Map[String,String
*/
def createJsonToJMapStringDirectStreamWithoutOffset(ssc:StreamingContext , topicsSet:Set[String]): DStream[java.util.Map[String,String]] = {
val converter = {json:String =>
var res : java.util.Map[String,String] = null
try {
res = com.alibaba.fastjson.JSON.parseObject(json, new TypeReference[java.util.Map[String, String]]() {})
} catch {
case e: Exception => logError(s"解析topic ${topicsSet}, 的记录 ${json} 失败。", e)
}
res
}
createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
}
}
object KafkaManager extends Logging{
def apply(broker:String, groupId:String = "default",
numFetcher:Int = 1, offset:String = "smallest",
autoUpdateoffset:Boolean = true): KafkaManager ={
new KafkaManager(
createKafkaParam(broker, groupId, numFetcher, offset),
autoUpdateoffset)
}
def createKafkaParam(broker:String, groupId:String = "default",
numFetcher:Int = 1, offset:String = "smallest"): Map[String, String] ={
//创建 stream 时使用的 topic 名字集合
Map[String, String](
"metadata.broker.list" -> broker,
"auto.offset.reset" -> offset,
"group.id" -> groupId,
"num.consumer.fetchers" -> numFetcher.toString)
}
}
5、resources/log4j.properties
### 设置###
log4j.rootLogger = error,stdout,D,E
### 输出信息到控制抬 ###
log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target = System.out
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern = [%-5p] %d{yyyy-MM-dd HH:mm:ss,SSS} method:%l%n%m%n
### 输出DEBUG 级别以上的日志到=E://logs/error.log ###
log4j.appender.D = org.apache.log4j.DailyRollingFileAppender
log4j.appender.D.File = E://logs/log.log
log4j.appender.D.Append = true
log4j.appender.D.Threshold = stdout
log4j.appender.D.layout = org.apache.log4j.PatternLayout
log4j.appender.D.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss} [ %t:%r ] - [ %p ] %m%n
###输出ERROR 级别以上的日志到=E://logs/error.log ###
log4j.appender.E = org.apache.log4j.DailyRollingFileAppender
log4j.appender.E.File =E://logs/error.log
log4j.appender.E.Append = true
log4j.appender.E.Threshold = ERROR
log4j.appender.E.layout = org.apache.log4j.PatternLayout
log4j.appender.E.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss} [ %t:%r ] - [ %p ] %m%n
6、xz_bigdata_spark/spark/streaming/kafka
Spark_Es_ConfigUtil.scala
package com.hsiehchou.spark.streaming.kafka
import org.apache.spark.Logging
object Spark_Es_ConfigUtil extends Serializable with Logging{
// val ES_NODES = "es.cluster.nodes"
// val ES_PORT = "es.cluster.http.port"
// val ES_CLUSTERNAME = "es.cluster.name"
val ES_NODES = "es.nodes"
val ES_PORT = "es.port"
val ES_CLUSTERNAME = "es.clustername"
def getEsParam(id_field : String): Map[String,String] ={
Map[String ,String]("es.mapping.id" -> id_field,
ES_NODES -> "hadoop1,hadoop2,hadoop3",
//ES_NODES -> "hadoop1",
ES_PORT -> "9200",
ES_CLUSTERNAME -> "xz_es",
"es.batch.size.entries"->"6000",
/* "es.nodes.wan.only"->"true",*/
"es.nodes.discovery"->"true",
"es.batch.size.bytes"->"300000000",
"es.batch.write.refresh"->"false"
)
}
}
Spark_Kafka_ConfigUtil.scala
package com.hsiehchou.spark.streaming.kafka
import org.apache.spark.Logging
object Spark_Kafka_ConfigUtil extends Serializable with Logging{
def getKafkaParam(brokerList:String,groupId : String): Map[String,String]={
val kafkaParam=Map[String,String](
"metadata.broker.list" -> brokerList,
"auto.offset.reset" -> "smallest",
"group.id" -> groupId,
"refresh.leader.backoff.ms" -> "1000",
"num.consumer.fetchers" -> "8")
kafkaParam
}
}
7、kafka2es
Kafka2esJob.scala
package com.hsiehchou.spark.streaming.kafka.kafka2es
import com.hsiehchou.es.admin.AdminUtil
import com.hsiehchou.es.client.ESClientUtils
import com.hsiehchou.spark.common.convert.DataConvert
import com.hsiehchou.spark.streaming.kafka.Spark_Es_ConfigUtil
import org.apache.spark.Logging
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming.dstream.DStream
import org.elasticsearch.client.transport.TransportClient
import org.elasticsearch.spark.rdd.EsSpark
object Kafka2esJob extends Serializable with Logging {
/**
* 按日期分组写入ES
* @param dataType
* @param typeDS
*/
def insertData2EsBydate(dataType:String,typeDS:DStream[java.util.Map[String,String]]): Unit ={
//通过 dataType + 日期来动态创建 分索引。 日期格式为 yyyyMMdd
//主要就是时间混杂 通过时间分组就行了 groupby filter
//index前缀 通过对日期进行过滤 避免shuffle操作
val index_prefix = dataType
val client: TransportClient = ESClientUtils.getClient
typeDS.foreachRDD(rdd=>{
//如果时少量数据可以这样处理
//rdd.groupBy()
//吧所有的日期拿到
val days = getDays(dataType,rdd)
//我们使用日期对数据进行过滤 par时scala并发集合
days.par.foreach(day=>{
//通过前缀+日期组成一个动态的索引 比例 qq + "_" + "20190508"
val index = index_prefix + "_" + day
//判断索引是否存在
val bool = AdminUtil.indexExists(client,index)
if(!bool){
//如果不存在,创建
val mappingPath = s"es/mapping/${index_prefix}.json"
AdminUtil.buildIndexAndTypes(index, index, mappingPath, 5, 1)
}
//构建RDD,数据类型 某一天的数据RDD
//返回一个map[String,obJECT] 的RDD //就是一个单一类型 单一天数的RDD
val tableRDD = rdd.filter(map=>{
day.equals(map.get("index_date"))
}).map(x=>{
//将map[String,String] 转为map[String,obJECT]
DataConvert.strMap2esObjectMap(x)
})
EsSpark.saveToEs(tableRDD,index+ "/"+index,Spark_Es_ConfigUtil.getEsParam("id"))
})
})
//日期为后
}
/**
* 获取日期的集合
* @param dataType
* @param rdd
* @return
*/
def getDays(dataType:String,rdd:RDD[java.util.Map[String,String]]): Array[String] ={
//对日期去重,然后集中到driver
return rdd.map(x=>{x.get("index_date")}).distinct().collect()
}
/**
* 将RDD转换之后写入ES
* @param dataType
* @param typeRDD
*/
def insertData2Es(dataType:String,typeRDD:RDD[java.util.Map[String,String]]): Unit = {
val index = dataType
val esRDD = typeRDD.map(x=>{
DataConvert.strMap2esObjectMap(x)
})
EsSpark.saveToEs(esRDD,index+ "/"+index,Spark_Es_ConfigUtil.getEsParam("id"))
println("写入ES" + esRDD.count() + "条数据成功")
}
/**
* 将RDD转换后写入ES
* @param dataType
* @param typeDS
*/
def insertData2Es(dataType:String, typeDS:DStream[java.util.Map[String, String]]): Unit = {
val index = dataType
typeDS.foreachRDD(rdd=>{
val esRDD = rdd.map(x=>{
DataConvert.strMap2esObjectMap(x)
})
EsSpark.saveToEs(rdd, dataType+"/"+dataType, Spark_Es_ConfigUtil.getEsParam("id"))
println("写入ES" + esRDD.count() + "条数据成功")
})
}
}
Kafka2esStreaming.scala
package com.hsiehchou.spark.streaming.kafka.kafka2es
import java.util
import java.util.Properties
import com.hsiehchou.common.config.ConfigUtil
import com.hsiehchou.common.project.datatype.DataTypeProperties
import com.hsiehchou.common.time.TimeTranstationUtils
import com.hsiehchou.spark.common.SparkContextFactory
import com.hsiehchou.spark.streaming.kafka.Spark_Kafka_ConfigUtil
import org.apache.commons.lang3.StringUtils
import org.apache.spark.Logging
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.streaming.kafka.KafkaManager
import scala.collection.JavaConversions._
object Kafka2esStreaming extends Serializable with Logging {
//获取数据类型
private val dataTypes: util.Set[String] = DataTypeProperties.dataTypeMap.keySet()
val kafkaConfig: Properties = ConfigUtil.getInstance().getProperties("kafka/kafka-server-config.properties")
def main(args: Array[String]): Unit = {
//val topics = "chl_test7".split(",")
val topics = args(1).split(",")
// val ssc = SparkConfFactory.newSparkLocalStreamingContext("XZ_kafka2es", java.lang.Long.valueOf(10),1)
val ssc = SparkContextFactory.newSparkStreamingContext("Kafka2esStreaming", java.lang.Long.valueOf(10))
//构建kafkaManager
val kafkaManager = new KafkaManager(
Spark_Kafka_ConfigUtil.getKafkaParam(kafkaConfig.getProperty("metadata.broker.list"), "XZ3")
)
//使用kafkaManager创建DStreaming流
val kafkaDS = kafkaManager.createJsonToJMapStringDirectStreamWithOffset(ssc, topics.toSet)
//添加一个日期分组字段
//如果数据其他的转换,可以先在这里进行统一转换
.map(map=>{
map.put("index_date",TimeTranstationUtils.Date2yyyyMMddHHmmss(java.lang.Long.valueOf(map.get("collect_time")+"000")))
map
}).persist(StorageLevel.MEMORY_AND_DISK)
//使用par并发集合可以是任务并发执行。在资源充足的情况下
dataTypes.foreach(datatype=>{
//过滤出单个类别的数据种类
val tableDS = kafkaDS.filter(x=>{datatype.equals(x.get("table"))})
Kafka2esJob.insertData2Es(datatype,tableDS)
})
ssc.start()
ssc.awaitTermination()
}
/**
* 启动参数检查
* @param args
*/
def sparkParamCheck(args: Array[String]): Unit ={
if (args.length == 4) {
if (StringUtils.isBlank(args(1))) {
logInfo("kafka集群地址不能为空")
logInfo("kafka集群地址格式为 主机1名:9092,主机2名:9092,主机3名:9092...")
logInfo("格式为 主机1名:9092,主机2名:9092,主机3名:9092...")
System.exit(-1)
}
if (StringUtils.isBlank(args(2))) {
logInfo("kafka topic1不能为空")
System.exit(-1)
}
if (StringUtils.isBlank(args(3))) {
logInfo("kafka topic2不能为空")
System.exit(-1)
}
}else{
logError("启动参数个数错误")
}
}
def startJob(ds:DStream[String]): Unit ={
}
}
java/com/hsiehchou/spark/common/convert/BaseDataConvert.java
package com.hsiehchou.spark.common.convert;
import org.apache.commons.lang.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.HashMap;
import java.util.Map;
public class BaseDataConvert {
private static final Logger LOG = LoggerFactory.getLogger(BaseDataConvert.class);
public static HashMap<String,Object> mapString2Long(Map<String,String> map, String key, HashMap<String,Object> objectMap) {
String logouttime = map.get(key);
if (StringUtils.isNotBlank(logouttime)) {
objectMap.put(key, Long.valueOf(logouttime));
} else {
objectMap.put(key, 0L);
}
return objectMap;
}
public static HashMap<String,Object> mapString2Double(Map<String,String> map, String key, HashMap<String,Object> objectMap) {
String logouttime = map.get(key);
if (StringUtils.isNotBlank(logouttime)) {
objectMap.put(key, Double.valueOf(logouttime));
} else {
objectMap.put(key, 0.000000);
}
return objectMap;
}
public static HashMap<String,Object> mapString2String(Map<String,String> map, String key, HashMap<String,Object> objectMap) {
String logouttime = map.get(key);
if (StringUtils.isNotBlank(logouttime)) {
objectMap.put(key, logouttime);
} else {
objectMap.put(key, "");
}
return objectMap;
}
}
8、ES动态索引创建
/**
* 按日期分组写入ES
* @param dataType
* @param typeDS
*/
def insertData2EsBydate(dataType:String,typeDS:DStream[java.util.Map[String,String]]): Unit ={
//通过 dataType + 日期来动态创建 分索引。 日期格式为 yyyyMMdd
//主要就是时间混杂 通过时间分组就行了 groupby filter
//index前缀 通过对日期进行过滤 避免shuffle操作
val index_prefix = dataType
val client: TransportClient = ESClientUtils.getClient
typeDS.foreachRDD(rdd=>{
//如果时少量数据可以这样处理
//rdd.groupBy()
//吧所有的日期拿到
val days = getDays(dataType,rdd)
//我们使用日期对数据进行过滤 par时scala并发集合
days.par.foreach(day=>{
//通过前缀+日期组成一个动态的索引 比例 qq + "_" + "20190508"
val index = index_prefix + "_" + day
//判断索引是否存在
val bool = AdminUtil.indexExists(client,index)
if(!bool){
//如果不存在,创建
val mappingPath = s"es/mapping/${index_prefix}.json"
AdminUtil.buildIndexAndTypes(index, index, mappingPath, 5, 1)
}
//构建RDD,数据类型 某一天的数据RDD
//返回一个map[String,obJECT] 的RDD //就是一个单一类型 单一天数的RDD
val tableRDD = rdd.filter(map=>{
day.equals(map.get("index_date"))
}).map(x=>{
//将map[String,String] 转为map[String,obJECT]
DataConvert.strMap2esObjectMap(x)
})
EsSpark.saveToEs(tableRDD,index+ "/"+index,Spark_Es_ConfigUtil.getEsParam("id"))
})
})
//日期为后
}
xz_bigdata_es下一节展示代码
9、CDH的java配置和Elasticsearch的配置
cdh的jdk设置
/usr/local/jdk1.8
kafka配置
Default Number of Partitions:num.partitions 8
Offset Commit Topic Number of Partitions:180天
Log Compaction Delete Record Retention Time:log.cleaner.delete.retention.ms 30天
Data Log Roll Hours:log.retention.hours 30天 log.roll.hours 30天
Java Heap Size of Broker:broker_max_heap_size 1吉字节
YARN
容器内存 5g 5g 1g 10g
这里的CDH安装另一篇文章介绍
前提安装好elasticsearch
mkdir /opt/software/elasticsearch/data/
mkdir /opt/software/elasticsearch/logs/
chmod 777 /opt/software/elasticsearch/data/
useradd elasticsearch
passwd elasticsearch
chown -R elasticsearch elasticsearch/
vim /etc/security/limits.conf
添加如下内容:*
soft nofile 65536*
hard nofile 131072*
soft nproc 2048*
hard nproc 4096
进入limits.d目录下修改配置文件
vim /etc/security/limits.d/90-nproc.conf
修改如下内容:
soft nproc 4096(修改为此参数,6版本的默认就是4096)
修改配置sysctl.conf
vim /etc/sysctl.conf
添加下面配置:
vm.max_map_count=655360
并执行命令:
sysctl -p
hadoop1的conf配置
elasticsearch.yml
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
cluster.name: xz_es
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#
node.name: node-1
node.master: true
node.data: true
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
path.data: /opt/software/elasticsearch/data
#
# Path to log files:
#
#path.logs: /path/to/logs
path.logs: /opt/software/elasticsearch/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: 192.168.116.201
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
discovery.zen.ping.unicast.hosts: ["hadoop1", "hadoop2", "hadoop3"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes:
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
jvm.options
修改下
-Xms64m
-Xmx64m
hadoop2的conf配置
elasticsearch.yml
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
cluster.name: xz_es
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
node.name: node-2
node.master: false
node.data: true
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
path.data: /opt/software/elasticsearch/data
#
# Path to log files:
#
#path.logs: /path/to/logs
path.logs: /opt/software/elasticsearch/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: 192.168.116.202
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
discovery.zen.ping.unicast.hosts: ["hadoop1", "hadoop2", "hadoop3"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes:
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
jvm.options
修改下
-Xms64m
-Xmx64m
hadoop3的conf配置
elasticsearch.yml
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
cluster.name: xz_es
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
node.name: node-3
node.master: false
node.data: true
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
path.data: /opt/software/elasticsearch/data
#
# Path to log files:
#
#path.logs: /path/to/logs
path.logs: /opt/software/elasticsearch/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: 192.168.116.203
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
discovery.zen.ping.unicast.hosts: ["hadoop1", "hadoop2", "hadoop3"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes:
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
jvm.options
修改下
-Xms64m
-Xmx64m
Kibana的conf配置
kibana.yml
# Kibana is served by a back end server. This setting specifies the port to use.
server.port: 5601
# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
#server.host: "localhost"
server.host: "192.168.116.202"
# Enables you to specify a path to mount Kibana at if you are running behind a proxy. This only affects
# the URLs generated by Kibana, your proxy is expected to remove the basePath value before forwarding requests
# to Kibana. This setting cannot end in a slash.
#server.basePath: ""
# The maximum payload size in bytes for incoming server requests.
#server.maxPayloadBytes: 1048576
# The Kibana server's name. This is used for display purposes.
#server.name: "your-hostname"
# The URL of the Elasticsearch instance to use for all your queries.
#elasticsearch.url: "http://localhost:9200"
elasticsearch.url: "http://192.168.116.201:9200"
运行Elasticsearch
cd /opt/software/elasticsearch
su elasticsearch
bin/elasticsearch &
运行Kibana
cd /opt/software/kibana/
bin/kibana &
10、kafka2es打包到集群执行
打包
使用maven工具点击install
放入集群
将打包完成的jar文件和xz_bigdata_spark-1.0-SNAPSHOT.jar 一起放入/usr/chl/spark7/目录下面
执行
spark-submit --
master yarn-cluster --
num-executors 1 --
driver-memory 500m --
executor-memory 1g --
executor-cores 1 --
jars $(echo /usr/chl/spark7/jars/*.jar | tr ‘ ‘ ‘,’) --
class com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar chl_test7 chl_test7
spark-submit--
master yarn-cluster //集群启动--
num-executors 1 //分配多少个进程--
driver-memory 500m //driver内存--
executor-memory 1g //进程内存--
executor-cores 1 //开多少个核,线程--
jars $(echo /usr/chl/spark8/jars/*.jar | tr ‘ ‘ ‘,’) //加载jar--
class com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar
11、运行截图
12、冲突查找快捷键
Ctrl+Alt+Shift+N
八、xz_bigdata_es开发
1、pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>xz_bigdata2</artifactId>
<groupId>com.hsiehchou</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>xz_bigdata_es</artifactId>
<name>xz_bigdata_es</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_resources</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_common</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
<version>6.2.3</version>
</dependency>
<dependency>
<groupId>io.searchbox</groupId>
<artifactId>jest</artifactId>
<version>6.3.1</version>
</dependency>
</dependencies>
</project>
2、admin
AdminUtil.java
package com.hsiehchou.es.admin;
import com.hsiehchou.common.file.FileCommon;
import com.hsiehchou.es.client.ESClientUtils;
import org.elasticsearch.action.admin.indices.create.CreateIndexResponse;
import org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class AdminUtil {
private static Logger LOG = LoggerFactory.getLogger(AdminUtil.class);
public static void main(String[] args) throws Exception{
//创建索引核mapping
AdminUtil.buildIndexAndTypes("tanslator_test1111","tanslator_test1111", "es/mapping/test.json",3,1);
//index = 类型+日期
//查找类 Ctrl+Shift+Alt+N
}
/**
* @param index
* @param type
* @param path
* @param shard
* @param replication
* @return
* @throws Exception
*/
public static boolean buildIndexAndTypes(String index,String type,String path,int shard,int replication) throws Exception{
boolean flag ;
TransportClient client = ESClientUtils.getClient();
String mappingJson = FileCommon.getAbstractPath(path);
boolean indices = AdminUtil.createIndices(client, index, shard, replication);
if(indices){
LOG.info("创建索引"+ index + "成功");
flag = MappingUtil.addMapping(client, index, type, mappingJson);
}
else{
LOG.error("创建索引"+ index + "失败");
flag = false;
}
return flag;
}
/**
* @desc 判断需要创建的index是否存在
* */
public static boolean indexExists(TransportClient client,String index){
boolean ifExists = false;
try {
System.out.println("client===" + client);
IndicesExistsResponse existsResponse = client.admin().indices().prepareExists(index).execute().actionGet();
ifExists = existsResponse.isExists();
} catch (Exception e) {
e.printStackTrace();
LOG.error("判断index是否存在失败...");
return ifExists;
}
return ifExists;
}
/**
* 创建索引
* @param client
* @param index
* @param shard
* @param replication
* @return
*/
public static boolean createIndices(TransportClient client, String index, int shard , int replication){
if(!indexExists(client,index)) {
LOG.info("该index不存在,创建...");
CreateIndexResponse createIndexResponse =null;
try {
createIndexResponse = client.admin().indices().prepareCreate(index)
.setSettings(Settings.builder()
.put("index.number_of_shards", shard)
.put("index.number_of_replicas", replication)
.put("index.codec", "best_compression")
.put("refresh_interval", "30s"))
.execute().actionGet();
return createIndexResponse.isAcknowledged();
} catch (Exception e) {
LOG.error(null, e);
return false;
}
}
LOG.warn("该index " + index + " 已经存在...");
return false;
}
}
MappingUtil.java
package com.hsiehchou.es.admin;
import com.alibaba.fastjson.JSON;
import org.elasticsearch.action.admin.indices.mapping.put.PutMappingRequest;
import org.elasticsearch.action.admin.indices.mapping.put.PutMappingResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentFactory;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
public class MappingUtil {
private static Logger LOG = LoggerFactory.getLogger(MappingUtil.class);
//关闭自动添加字段,关闭后索引数据中如果有多余字段不会修改mapping,默认true
private boolean dynamic = true;
public static XContentBuilder buildMapping(String tableName) throws IOException {
XContentBuilder builder = null;
try {
builder = XContentFactory.jsonBuilder().startObject()
.startObject(tableName)
.startObject("_source").field("enabled", true).endObject()
.startObject("properties")
.startObject("id").field("type", "long").endObject()
.startObject("sn").field("type", "text").endObject()
.endObject()
.endObject()
.endObject();
} catch (IOException e) {
e.printStackTrace();
}
return builder;
}
public static boolean addMapping(TransportClient client, String index, String type, String jsonString){
PutMappingResponse putMappingResponse = null;
try {
PutMappingRequest mappingRequest = new PutMappingRequest(index)
.type(type).source(JSON.parseObject(jsonString));
putMappingResponse = client.admin().indices().putMapping(mappingRequest).actionGet();
} catch (Exception e) {
LOG.error(null,e);
e.printStackTrace();
LOG.error("添加" + type + "的mapping失败....",e);
return false;
}
boolean success = putMappingResponse.isAcknowledged();
if (success){
LOG.info("创建" + type + "的mapping成功....");
return success;
}
return success;
}
public static void main(String[] args) throws Exception {
/*String singleConf = ConsulConfigUtil.getSingleConf("es6.1.0/mapping/http");
int i = singleConf.length() / 2;
System.out.println(i);*/
}
}
3、client
ESClientUtils.java
package com.hsiehchou.es.client;
import com.hsiehchou.common.config.ConfigUtil;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.Serializable;
import java.net.InetAddress;
import java.util.Properties;
/**
* ES 客户端获取
*/
public class ESClientUtils implements Serializable{
private static Logger LOG = LoggerFactory.getLogger(ESClientUtils.class);
private volatile static TransportClient esClusterClient;
private ESClientUtils(){}
private static Properties properties;
static {
properties = ConfigUtil.getInstance().getProperties("es/es_cluster.properties");
}
public static TransportClient getClient(){
System.setProperty("es.set.netty.runtime.available.processors", "false");
String clusterName = properties.getProperty("es.cluster.name");
String clusterNodes1 = properties.getProperty("es.cluster.nodes1");
String clusterNodes2 = properties.getProperty("es.cluster.nodes2");
String clusterNodes3 = properties.getProperty("es.cluster.nodes3");
LOG.info("clusterName:"+ clusterName);
LOG.info("clusterNodes:"+ clusterNodes1);
LOG.info("clusterNodes:"+ clusterNodes2);
LOG.info("clusterNodes:"+ clusterNodes3);
if(esClusterClient==null){
synchronized (ESClientUtils.class){
if(esClusterClient==null){
try{
Settings settings = Settings.builder()
.put("cluster.name", clusterName)
//.put("searchguard.ssl.transport.enabled", false)
//.put("xpack.security.user", "sc_xy_mn_es:xy@66812.com")
// .put("transport.type","netty3")
// .put("http.type","netty3")
.put("client.transport.sniff",true).build();//开启自动嗅探功能
esClusterClient = new PreBuiltTransportClient(settings)
.addTransportAddress(new TransportAddress(InetAddress.getByName(clusterNodes1), 9300))
.addTransportAddress(new TransportAddress(InetAddress.getByName(clusterNodes2), 9300))
.addTransportAddress(new TransportAddress(InetAddress.getByName(clusterNodes3), 9300));
LOG.info("esClusterClient========" + esClusterClient.listedNodes());
}catch (Exception e){
LOG.error("获取客户端失败",e);
}finally {
}
}
}
}
return esClusterClient;
}
public static void main(String[] args) {
TransportClient client = ESClientUtils.getClient();
System.out.println(client);
}
}
4、jest/service
IndexTypeUtil.java
package com.hsiehchou.es.jest.service;
import com.hsiehchou.common.config.JsonReader;
import io.searchbox.client.JestClient;
public class IndexTypeUtil {
public static void main(String[] args) {
IndexTypeUtil.createIndexAndType("tanslator","es/mapping/tanslator.json");
// IndexTypeUtil.createIndexAndType("task");
// IndexTypeUtil.createIndexAndType("ability");
// IndexTypeUtil.createIndexAndType("paper");
}
public static void createIndexAndType(String index,String jsonPath){
try{
JestClient jestClient = JestService.getJestClient();
JestService.createIndex(jestClient, index);
JestService.createIndexMapping(jestClient,index,index,getSourceFromJson(jsonPath));
}catch (Exception e){
e.printStackTrace();
//LOG.error("创建索引失败",e);
}
}
public static String getSourceFromJson(String path){
return JsonReader.readJson(path);
}
public static String getSource(String index){
if(index.equals("task")){
return "{\"_source\": {\n" +
" \"enabled\": true\n" +
" },\n" +
" \"properties\": {\n" +
" \"taskwordcount\": {\n" +
" \"type\": \"long\"\n" +
" },\n" +
" \"taskprice\": {\n" +
" \"type\": \"float\"\n" +
" }\n" +
" }\n" +
"}";
}
if(index.equals("tanslator")){
return "{\n" +
" \"_source\": {\n" +
" \"enabled\": true\n" +
" },\n" +
" \"properties\": {\n" +
" \"birthday\": {\n" +
" \"type\": \"text\",\n" +
" \"fields\": {\n" +
" \"keyword\": {\n" +
" \"ignore_above\": 256,\n" +
" \"type\": \"keyword\"\n" +
" }\n" +
" }\n" +
" },\n" +
" \"createtime\":{\n" +
" \"type\": \"long\"\n" +
" },\n" +
" \"updatetime\":{\n" +
" \"type\": \"long\"\n" +
" },\n" +
" \"avgcooperation\":{\n" +
" \"type\": \"long\"\n" +
" },\n" +
" \"cooperationwordcount\":{\n" +
" \"type\": \"long\"\n" +
" },\n" +
" \"cooperation\":{\n" +
" \"type\": \"long\"\n" +
" },\n" +
" \"cooperationtime\":{\n" +
" \"type\": \"long\"\n" +
" },\n" +
" \"age\":{\n" +
" \"type\": \"long\"\n" +
" },\n" +
" \"industry\": {\n" +
" \"type\": \"nested\",\n" +
" \"properties\": {\n" +
" \"industryname\": {\n" +
" \"type\": \"text\",\n" +
" \"fields\": {\n" +
" \"keyword\": {\n" +
" \"ignore_above\": 256,\n" +
" \"type\": \"keyword\"\n" +
" }\n" +
" }\n" +
" },\n" +
" \"count\": {\n" +
" \"type\": \"long\"\n" +
" },\n" +
" \"industryid\": {\n" +
" \"type\": \"text\",\n" +
" \"fields\": {\n" +
" \"keyword\": {\n" +
" \"ignore_above\": 256,\n" +
" \"type\": \"keyword\"\n" +
" }\n" +
" }\n" +
" }\n" +
" }\n" +
" }\n" +
"\n" +
" }\n" +
"}";
}
return "";
}
}
JestService.java
package com.hsiehchou.es.jest.service;
import com.hsiehchou.common.file.FileCommon;
import com.google.gson.GsonBuilder;
import io.searchbox.action.Action;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.JestResult;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.*;
import io.searchbox.indices.CreateIndex;
import io.searchbox.indices.DeleteIndex;
import io.searchbox.indices.IndicesExists;
import io.searchbox.indices.mapping.GetMapping;
import io.searchbox.indices.mapping.PutMapping;
import org.apache.commons.lang.StringUtils;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.aggregations.AggregationBuilder;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.sort.SortOrder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.util.List;
import java.util.Map;
public class JestService {
private static Logger LOG = LoggerFactory.getLogger(JestService.class);
/**
* 获取JestClient对象
*
* @return
*/
public static JestClient getJestClient() {
JestClientFactory factory = new JestClientFactory();
factory.setHttpClientConfig(new HttpClientConfig
.Builder("http://hadoop1:9200")
//.defaultCredentials("sc_xy_mn_es","xy@66812.com")
.gson(new GsonBuilder().setDateFormat("yyyy-MM-dd'T'hh:mm:ss").create())
.connTimeout(1500)
.readTimeout(3000)
.multiThreaded(true)
.build());
return factory.getObject();
}
public static void main(String[] args) throws Exception {
JestClient jestClient = null;
// Map<String, Long> stringLongMap = null;
List<Map<String, Object>> maps = null;
try {
jestClient = JestService.getJestClient();
/* SearchResult aggregation = JestService.aggregation(jestClient,
"wechat",
"wechat",
"collect_time");
stringLongMap = ResultParse.parseAggregation(aggregation);*/
/* SearchResult search = search(jestClient,
"wechat",
"wechat",
"id",
"65a3d548bd3e42b1972191bc2bd2829b",
"collect_time",
"desc",
1,
2);*/
/*SearchResult search = search(jestClient,
"",
"",
"phone_mac",
"aa-aa-aa-aa-aa-aa",
"collect_time",
"asc",
1,
1000);*/
// System.out.println(indexExists(jestClient,"wechat"));
System.out.println("wechat数据量:"+count(jestClient,"wechat","wechat"));
System.out.println(aggregation(jestClient,"wechat","wechat", "phone"));
String[] includes = new String[]{"latitude","longitude","collect_time"};
// try{
SearchResult search = JestService.search(jestClient,
"",
"",
"phone_mac.keyword",
"aa-aa-aa-aa-aa-aa",
"collect_time",
"asc",
1,
2000);
maps = ResultParse.parseSearchResultOnly(search);
System.out.println(maps.size());
System.out.println(maps);
} catch (Exception e) {
e.printStackTrace();
} finally {
JestService.closeJestClient(jestClient);
}
System.out.println(maps);
// } catch (Exception e) {
// e.printStackTrace();
// }finally {
// JestService.closeJestClient(jestClient);
// }
// System.out.println(stringLongMap);
}
/**
* 统计一个索引所有数据
* @param jestClient
* @param indexName
* @param typeName
* @return
* @throws Exception
*/
public static Long count(JestClient jestClient,
String indexName,
String typeName) throws Exception {
Count count = new Count.Builder()
.addIndex(indexName)
.addType(typeName)
.build();
CountResult results = jestClient.execute(count);
return results.getCount().longValue();
}
/**
* 聚合分组查询
* @param jestClient
* @param indexName
* @param typeName
* @param field
* @return
* @throws Exception
*/
public static SearchResult aggregation(JestClient jestClient, String indexName, String typeName, String field) throws Exception {
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//分组聚合API
AggregationBuilder group1 = AggregationBuilders.terms("group1").field(field);
//group1.subAggregation(AggregationBuilders.terms("group2").field(query));
searchSourceBuilder.aggregation(group1);
searchSourceBuilder.size(0);
System.out.println(searchSourceBuilder.toString());
Search search = new Search.Builder(searchSourceBuilder.toString())
.addIndex(indexName)
.addType(typeName).build();
SearchResult result = jestClient.execute(search);
return result;
}
//基础封装
public static SearchResult search(
JestClient jestClient,
String indexName,
String typeName,
String field,
String fieldValue,
String sortField,
String sortValue,
int pageNumber,
int pageSize,
String[] includes) {
//构造一个查询体 封装的就是查询语句
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.fetchSource(includes,new String[0]);
//查询构造器
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
if(StringUtils.isEmpty(field)){
boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.matchAllQuery());
}else{
boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.termQuery(field,fieldValue));
}
searchSourceBuilder.query(boolQueryBuilder);
//定义分页
//从什么时候开始
searchSourceBuilder.from((pageNumber-1)*pageSize);
searchSourceBuilder.size(pageSize);
//设置排序
if("desc".equals(sortValue)){
searchSourceBuilder.sort(sortField,SortOrder.DESC);
}else{
searchSourceBuilder.sort(sortField,SortOrder.ASC);
}
System.out.println("sql =====" + searchSourceBuilder.toString());
//构造一个查询执行器
Search.Builder builder = new Search.Builder(searchSourceBuilder.toString());
//设置indexName typeName
if(StringUtils.isNotBlank(indexName)){
builder.addIndex(indexName);
}
if(StringUtils.isNotBlank(typeName)){
builder.addType(typeName);
}
Search build = builder.build();
SearchResult searchResult = null;
try {
searchResult = jestClient.execute(build);
} catch (IOException e) {
LOG.error("查询失败",e);
}
return searchResult;
}
//基础封装
public static SearchResult search(
JestClient jestClient,
String indexName,
String typeName,
String field,
String fieldValue,
String sortField,
String sortValue,
int pageNumber,
int pageSize) {
//构造一个查询体 封装的就是查询语句
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//查询构造器
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
if(StringUtils.isEmpty(field)){
boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.matchAllQuery());
}else{
boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.termQuery(field,fieldValue));
}
searchSourceBuilder.query(boolQueryBuilder);
//定义分页
//从什么时候开始
searchSourceBuilder.from((pageNumber-1)*pageSize);
searchSourceBuilder.size(pageSize);
//设置排序
if("desc".equals(sortValue)){
searchSourceBuilder.sort(sortField,SortOrder.DESC);
}else{
searchSourceBuilder.sort(sortField,SortOrder.ASC);
}
System.out.println("sql =====" + searchSourceBuilder.toString());
//构造一个查询执行器
Search.Builder builder = new Search.Builder(searchSourceBuilder.toString());
//设置indexName typeName
if(StringUtils.isNotBlank(indexName)){
builder.addIndex(indexName);
}
if(StringUtils.isNotBlank(typeName)){
builder.addType(typeName);
}
Search build = builder.build();
SearchResult searchResult = null;
try {
searchResult = jestClient.execute(build);
} catch (IOException e) {
LOG.error("查询失败",e);
}
return searchResult;
}
/* //基础封装
public static SearchResult search(
JestClient jestClient,
String indexName,
String typeName,
String field,
String fieldValue,
String sortField,
String sortValue,
int pageNumber,
int pageSize) {
//构造一个查询体 封装的就是查询语句
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//查询构造器
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
if(StringUtils.isEmpty(field)){
boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.matchAllQuery());
}else{
boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.termQuery(field,fieldValue));
}
searchSourceBuilder.query(boolQueryBuilder);
//定义分页
//从什么时候开始
searchSourceBuilder.from((pageNumber-1)*pageSize);
searchSourceBuilder.size(pageSize);
//设置排序
if("desc".equals(sortValue)){
searchSourceBuilder.sort(sortField,SortOrder.DESC);
}else{
searchSourceBuilder.sort(sortField,SortOrder.ASC);
}
System.out.println("sql =====" + searchSourceBuilder.toString());
//构造一个查询执行器
Search.Builder builder = new Search.Builder(searchSourceBuilder.toString());
//设置indexName typeName
if(StringUtils.isNotBlank(indexName)){
builder.addIndex(indexName);
}
if(StringUtils.isNotBlank(typeName)){
builder.addType(typeName);
}
Search build = builder.build();
SearchResult searchResult = null;
try {
searchResult = jestClient.execute(build);
} catch (IOException e) {
LOG.error("查询失败",e);
}
return searchResult;
}
*/
/**
* 判断索引是否存在
*
* @param jestClient
* @param indexName
* @return
* @throws Exception
*/
public static boolean indexExists(JestClient jestClient, String indexName) {
JestResult result = null;
try {
Action action = new IndicesExists.Builder(indexName).build();
result = jestClient.execute(action);
} catch (IOException e) {
LOG.error(null, e);
}
return result.isSucceeded();
}
/**
* 创建索引
*
* @param jestClient
* @param indexName
* @return
* @throws Exception
*/
public static boolean createIndex(JestClient jestClient, String indexName) throws Exception {
if (!JestService.indexExists(jestClient, indexName)) {
JestResult jr = jestClient.execute(new CreateIndex.Builder(indexName).build());
return jr.isSucceeded();
} else {
LOG.info("该索引已经存在");
return false;
}
}
public static boolean createIndexWithSettingsMapAndMappingsString(JestClient jestClient, String indexName, String type, String path) throws Exception {
// String mappingJson = "{\"type1\": {\"_source\":{\"enabled\":false},\"properties\":{\"field1\":{\"type\":\"keyword\"}}}}";
String mappingJson = FileCommon.getAbstractPath(path);
String realMappingJson = "{" + type + ":" + mappingJson + "}";
System.out.println(realMappingJson);
CreateIndex createIndex = new CreateIndex.Builder(indexName)
.mappings(realMappingJson)
.build();
JestResult jr = jestClient.execute(createIndex);
return jr.isSucceeded();
}
/**
* Put映射
*
* @param jestClient
* @param indexName
* @param typeName
* @param source
* @return
* @throws Exception
*/
public static boolean createIndexMapping(JestClient jestClient, String indexName, String typeName, String source) throws Exception {
PutMapping putMapping = new PutMapping.Builder(indexName, typeName, source).build();
JestResult jr = jestClient.execute(putMapping);
return jr.isSucceeded();
}
/**
* Get映射
*
* @param jestClient
* @param indexName
* @param typeName
* @return
* @throws Exception
*/
public static String getIndexMapping(JestClient jestClient, String indexName, String typeName) throws Exception {
GetMapping getMapping = new GetMapping.Builder().addIndex(indexName).addType(typeName).build();
JestResult jr = jestClient.execute(getMapping);
return jr.getJsonString();
}
/**
* 索引文档
*
* @param jestClient
* @param indexName
* @param typeName
* @return
* @throws Exception
*/
public static boolean index(JestClient jestClient, String indexName, String typeName, String idField, List<Map<String, Object>> listMaps) throws Exception {
Bulk.Builder bulk = new Bulk.Builder().defaultIndex(indexName).defaultType(typeName);
for (Map<String, Object> map : listMaps) {
if (map != null && map.containsKey(idField)) {
Object o = map.get(idField);
Index index = new Index.Builder(map).id(map.get(idField).toString()).build();
bulk.addAction(index);
}
}
BulkResult br = jestClient.execute(bulk.build());
return br.isSucceeded();
}
/**
* 索引文档
*
* @param jestClient
* @param indexName
* @param typeName
* @return
* @throws Exception
*/
public static boolean indexString(JestClient jestClient, String indexName, String typeName, String idField, List<Map<String, String>> listMaps) throws Exception {
if (listMaps != null && listMaps.size() > 0) {
Bulk.Builder bulk = new Bulk.Builder().defaultIndex(indexName).defaultType(typeName);
for (Map<String, String> map : listMaps) {
if (map != null && map.containsKey(idField)) {
Index index = new Index.Builder(map).id(map.get(idField)).build();
bulk.addAction(index);
}
}
BulkResult br = jestClient.execute(bulk.build());
return br.isSucceeded();
} else {
return false;
}
}
/**
* 索引文档
*
* @param jestClient
* @param indexName
* @param typeName
* @return
* @throws Exception
*/
public static boolean indexOne(JestClient jestClient, String indexName, String typeName, String id, Map<String, Object> map) {
Index.Builder builder = new Index.Builder(map);
builder.id(id);
builder.refresh(true);
Index index = builder.index(indexName).type(typeName).build();
try {
JestResult result = jestClient.execute(index);
if (result != null && !result.isSucceeded()) {
throw new RuntimeException(result.getErrorMessage() + "插入更新索引失败!");
}
} catch (Exception e) {
e.printStackTrace();
return false;
}
return true;
}
/**
* 搜索文档
*
* @param jestClient
* @param indexName
* @param typeName
* @param query
* @return
* @throws Exception
*/
public static SearchResult search(JestClient jestClient, String indexName, String typeName, String query) throws Exception {
Search search = new Search.Builder(query)
.addIndex(indexName)
.addType(typeName)
.build();
return jestClient.execute(search);
}
/**
* Get文档
*
* @param jestClient
* @param indexName
* @param typeName
* @param id
* @return
* @throws Exception
*/
public static JestResult get(JestClient jestClient, String indexName, String typeName, String id) throws Exception {
Get get = new Get.Builder(indexName, id).type(typeName).build();
return jestClient.execute(get);
}
/**
* Delete索引
*
* @param jestClient
* @param indexName
* @return
* @throws Exception
*/
public boolean delete(JestClient jestClient, String indexName) throws Exception {
JestResult jr = jestClient.execute(new DeleteIndex.Builder(indexName).build());
return jr.isSucceeded();
}
/**
* Delete文档
*
* @param jestClient
* @param indexName
* @param typeName
* @param id
* @return
* @throws Exception
*/
public static boolean delete(JestClient jestClient, String indexName, String typeName, String id) throws Exception {
DocumentResult dr = jestClient.execute(new Delete.Builder(id).index(indexName).type(typeName).build());
return dr.isSucceeded();
}
/**
* 关闭JestClient客户端
*
* @param jestClient
* @throws Exception
*/
public static void closeJestClient(JestClient jestClient) {
if (jestClient != null) {
jestClient.shutdownClient();
}
}
public static String query = "{\n" +
" \"size\": 1,\n" +
" \"query\": {\n" +
" \"match\": {\n" +
" \"taskexcuteid\": \"89899143\"\n" +
" }\n" +
" },\n" +
" \"aggs\": {\n" +
" \"count\": {\n" +
" \"terms\": {\n" +
" \"field\": \"source.keyword\"\n" +
" },\n" +
" \"aggs\": {\n" +
" \"sum_price\": {\n" +
" \"sum\": {\n" +
" \"field\": \"taskprice\"\n" +
" }\n" +
" },\n" +
" \"sum_wordcount\": {\n" +
" \"sum\": {\n" +
" \"field\": \"taskwordcount\"\n" +
" }\n" +
" },\n" +
" \"avg_taskprice\": {\n" +
" \"avg\": {\n" +
" \"field\": \"taskprice\"\n" +
" }\n" +
" }\n" +
" }\n" +
" }\n" +
" }\n" +
"}";
}
ResultParse.java
package com.hsiehchou.es.jest.service;
import com.google.gson.Gson;
import com.google.gson.JsonElement;
import com.google.gson.JsonObject;
import com.google.gson.JsonPrimitive;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestResult;
import io.searchbox.core.SearchResult;
import io.searchbox.core.search.aggregation.MetricAggregation;
import io.searchbox.core.search.aggregation.TermsAggregation;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.*;
public class ResultParse {
private static Logger LOG = LoggerFactory.getLogger(ResultParse.class);
public static void main(String[] args) throws Exception {
JestClient jestClient = JestService.getJestClient();
/*long l = System.currentTimeMillis();
JestClient jestClient = JestClientUtil.getJestClient();
System.out.println(jestClient);
String json ="{\n" +
" \"size\": 1, \n" +
" \"query\": {\n" +
" \"query_string\": {\n" +
" \"query\": \"中文\"\n" +
" }\n" +
" },\n" +
" \"highlight\": {\n" +
" \"pre_tags\" : [ \"<red>\" ],\n" +
" \"post_tags\" : [ \"</red>\" ],\n" +
" \"fields\":{\n" +
" \"secondlanguage\": {}\n" +
" ,\"firstlanguage\": {}\n" +
" }\n" +
" }\n" +
"}";
SearchResult search = JestService.search(jestClient, ES_INDEX.TANSLATOR_TEST, ES_INDEX.TANSLATOR_TEST,json);
ResultParse.parseSearchResult(search);
jestClient.shutdownClient();
long l1 = System.currentTimeMillis();
System.out.println(l1-l);*/
}
public static Map<String,Object> parseGet(JestResult getResult){
Map<String,Object> map = null;
JsonObject jsonObject = getResult.getJsonObject().getAsJsonObject("_source");
if(jsonObject != null){
map = new HashMap<String,Object>();
//System.out.println(jsonObject);
Set<Map.Entry<String, JsonElement>> entries = jsonObject.entrySet();
for(Map.Entry<String, JsonElement> entry:entries){
JsonElement value = entry.getValue();
if(value.isJsonPrimitive()){
JsonPrimitive value1 = (JsonPrimitive) value;
// LOG.error("转换前==========" + value1);
if( value1.isString() ){
// LOG.error("转换后==========" + value1.getAsString());
map.put(entry.getKey(),value1.getAsString());
}else{
map.put(entry.getKey(),value1);
}
}else{
map.put(entry.getKey(),value);
}
}
}
return map;
}
public static Map<String,Object> parseGet2map(JestResult getResult){
JsonObject source = getResult.getJsonObject().getAsJsonObject("_source");
Gson gson = new Gson();
Map map = gson.fromJson(source, Map.class);
return map;
}
/**
* 解析listMap
* 结果格式为 {hits=0, total=0, data=[]}
* @param search
* @return
*/
public static List<Map<String,Object>> parseSearchResultOnly(SearchResult search){
List<Map<String,Object>> list = new ArrayList<Map<String,Object>>();
List<SearchResult.Hit<Object, Void>> hits = search.getHits(Object.class);
for(SearchResult.Hit<Object, Void> hit : hits){
Map<String,Object> source = (Map<String,Object>)hit.source;
list.add(source);
}
return list;
}
/**
* 解析listMap
* 结果格式为 {hits=0, total=0, data=[]}
* @param search
* @return
*/
public static Map<String,Long> parseAggregation(SearchResult search){
Map<String,Long> mapResult = new HashMap<>();
MetricAggregation aggregations = search.getAggregations();
TermsAggregation group1 = aggregations.getTermsAggregation("group1");
List<TermsAggregation.Entry> buckets = group1.getBuckets();
buckets.forEach(x->{
String key = x.getKey();
Long count = x.getCount();
mapResult.put(key,count);
});
return mapResult;
}
/**
* 解析listMap
* 结果格式为 {hits=0, total=0, data=[]}
* @param search
* @return
*/
public static Map<String,Object> parseSearchResult(SearchResult search){
Map<String,Object> map = new HashMap<String,Object>();
List<Map<String,Object>> list = new ArrayList<Map<String,Object>>();
Long total = search.getTotal();
map.put("total",total);
List<SearchResult.Hit<Object, Void>> hits = search.getHits(Object.class);
map.put("hits",hits.size());
for(SearchResult.Hit<Object, Void> hit : hits){
Map<String, List<String>> highlight = hit.highlight;
Map<String,Object> source = (Map<String,Object>)hit.source;
source.put("highlight",highlight);
list.add(source);
}
map.put("data",list);
return map;
}
}
5、search
BuilderUtil.java
package com.hsiehchou.es.search;
import org.apache.commons.lang.StringUtils;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.client.transport.TransportClient;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class BuilderUtil {
private static Logger LOG = LoggerFactory.getLogger(BuilderUtil.class);
public static SearchRequestBuilder getSearchBuilder(TransportClient client, String index, String type){
SearchRequestBuilder builder = null;
try {
if (StringUtils.isNotBlank(index)) {
builder = client.prepareSearch(index.split(","));
} else {
builder = client.prepareSearch();
}
if (StringUtils.isNotBlank(type)) {
builder.setTypes(type.split(","));
}
} catch (Exception e) {
LOG.error(null, e);
}
return builder;
}
public static SearchRequestBuilder getSearchBuilder(TransportClient client, String[] indexs, String type){
SearchRequestBuilder builder = null;
try {
if (indexs.length>0) {
for(String index:indexs){
builder = client.prepareSearch(index);
}
} else {
builder = client.prepareSearch();
}
if (StringUtils.isNotBlank(type)) {
builder.setTypes(type);
}
} catch (Exception e) {
LOG.error(null, e);
}
return builder;
}
}
QueryUtil.java
package com.hsiehchou.es.search;
import com.hsiehchou.es.utils.UnicodeUtil;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.QueryStringQueryBuilder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.Map;
public class QueryUtil {
private static Logger LOG = LoggerFactory.getLogger(QueryUtil.class);
/**
* EQ 等於
* NEQ 不等於
* GE 大于等于
* GT 大于
* LE 小于等于
* LT 小于
* RANGE 区间范围
*/
public static enum OPREATOR {EQ, NEQ,WILDCARD, GE, LE, GT, LT, FUZZY, RANGE, IN, PREFIX}
/**
* @param paramMap
* @return
*/
public static BoolQueryBuilder getSearchParam(Map<OPREATOR, Map<String, Object>> paramMap) {
BoolQueryBuilder qb = QueryBuilders.boolQuery();
if (null != paramMap && !paramMap.isEmpty()) {
for (Map.Entry<OPREATOR, Map<String, Object>> paramEntry : paramMap.entrySet()) {
OPREATOR key = paramEntry.getKey();
Map<String, Object> fieldMap = paramEntry.getValue();
for (Map.Entry<String, Object> fieldEntry : fieldMap.entrySet()) {
String field = fieldEntry.getKey();
Object value = fieldEntry.getValue();
switch (key) {
case EQ:/**等於查詢 equale**/
qb.must(QueryBuilders.matchPhraseQuery(field, value).slop(0));
break;
case NEQ:/**不等於查詢 not equale**/
qb.mustNot(QueryBuilders.matchQuery(field, value));
break;
case GE: /**大于等于查詢 great than or equal to**/
qb.must(QueryBuilders.rangeQuery(field).gte(value));
break;
case LE: /**小于等于查詢 less than or equal to**/
qb.must(QueryBuilders.rangeQuery(field).lte(value));
break;
case GT: /**大于查詢**/
qb.must(QueryBuilders.rangeQuery(field).gt(value));
break;
case LT: /**小于查詢**/
qb.must(QueryBuilders.rangeQuery(field).lt(value));
break;
case FUZZY:
String text = String.valueOf(value);
if (!UnicodeUtil.hasChinese(text)) {
text = "*" + text + "*";
}
text = QueryParser.escape(text);
qb.must(new QueryStringQueryBuilder(text).field(field));
break;
case RANGE: /**区间查詢**/
String[] split = value.toString().split(",");
if(split.length==2){
qb.must(QueryBuilders.rangeQuery(field).from(Long.valueOf(split[0]))
.to(Long.valueOf(split[1])));
}
/* if (value instanceof Map) {
Map<String, Object> rangMap = (Map<String, Object>) value;
qb.must(QueryBuilders.rangeQuery(field).from(rangMap.get("ge"))
.to(rangMap.get("le")));
}*/
break;
case PREFIX: /**前缀查詢**/
qb.must(QueryBuilders.prefixQuery(field, String.valueOf(value)));
break;
case IN:
qb.must(QueryBuilders.termsQuery(field, (Object[]) value));
break;
default:
qb.must(QueryBuilders.matchQuery(field, value));
break;
}
}
}
}
return qb;
}
}
ResponseParse.java
package com.hsiehchou.es.search;
import org.elasticsearch.action.get.GetResponse;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.Map;
public class ResponseParse {
private static Logger LOG = LoggerFactory.getLogger(BuilderUtil.class);
public static Map<String, Object> parseGetResponse(GetResponse getResponse){
Map<String, Object> source = null;
try {
source = getResponse.getSource();
} catch (Exception e) {
LOG.error(null,e);
}
return source;
}
}
SearchUtil.java
package com.hsiehchou.es.search;
import com.hsiehchou.es.client.ESClientUtils;
import org.elasticsearch.action.get.GetRequestBuilder;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.MatchQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
public class SearchUtil {
private static Logger LOG = LoggerFactory.getLogger(SearchUtil.class);
private static TransportClient client = ESClientUtils.getClient();
public static void main(String[] args) {
TransportClient client = ESClientUtils.getClient();
List<Map<String, Object>> maps = searchSingleData(client, "wechat", "wechat", "phone_mac", "aa-aa-aa-aa-aa-aa");
System.out.println(maps);
/* long l = System.currentTimeMillis();
searchSingleData("tanslator", "tanslator","4e1117d7-c434-48a7-9134-45f7c90f94ee_TR1100397895_2");
System.out.println("消耗时间" + (System.currentTimeMillis() - l));
long lll = System.currentTimeMillis();
searchSingleData("tanslator", "tanslator","4e1117d7-c434-48a7-9134-45f7c90f94ee_TR1100397895_2");
System.out.println("消耗时间" + (System.currentTimeMillis() - lll));
long ll = System.currentTimeMillis();
List<Map<String, Object>> maps = searchSingleData(client,"tanslator", "tanslator", "iolid", "TR1100397895");
System.out.println("消耗时间" + (System.currentTimeMillis() - ll));
System.out.println(maps);*/
}
/**
* 查询单条数据
* @param index 索引
* @param type 表名
* @param id 字段
* @return
*/
public static GetResponse searchSingleData(String index, String type, String id) {
GetResponse response = null;
try {
GetRequestBuilder builder = null;
builder = client.prepareGet(index, type, id);
response = builder.execute().actionGet();
} catch (Exception e) {
LOG.error(null, e);
}
return response;
}
/**
* @param index
* @param type
* @param field
* @param value
* @return
*/
public static List<Map<String, Object>> searchSingleData(TransportClient client,String index, String type,String field, String value) {
List<Map<String, Object>> result = new ArrayList<>();
try {
SearchRequestBuilder builder = BuilderUtil.getSearchBuilder(client,index,type);
MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery(field, value);
builder.setQuery(matchQueryBuilder).setExplain(false);
SearchResponse searchResponse = builder.execute().actionGet();
SearchHits hits = searchResponse.getHits();
SearchHit[] searchHists = hits.getHits();
for (SearchHit sh : searchHists) {
result.add(sh.getSourceAsMap());
}
} catch (Exception e) {
e.printStackTrace();
LOG.error(null, e);
}
return result;
}
/**
* 多条件查詢
* @param index
* @param type
* @param paramMap 组合查询条件
* @return
*/
public static SearchResponse searchListData(String index, String type,
Map<QueryUtil.OPREATOR,Map<String,Object>> paramMap) {
SearchRequestBuilder builder = BuilderUtil.getSearchBuilder(client,index,type);
builder.setQuery(QueryUtil.getSearchParam(paramMap)).setExplain(false);
SearchResponse searchResponse = builder.get();
return searchResponse;
}
/**
* 多条件查詢
* @param index
* @param type
* @param paramMap 组合查询条件
* @return
*/
public static SearchResponse searchListData1(String index, String type, Map<String,String> paramMap) {
BoolQueryBuilder qb = QueryBuilders.boolQuery();
qb.must(QueryBuilders.matchQuery("", ""));
BoolQueryBuilder qb1 = QueryBuilders.boolQuery();
qb1.should(QueryBuilders.matchQuery("",""));
qb1.should(QueryBuilders.matchQuery("",""));
qb.must(qb1);
return null;
}
}
6、utils
ESresultUtil.java
package com.hsiehchou.es.utils;
import org.apache.commons.lang.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.Map;
public class ESresultUtil {
private static Logger LOG = LoggerFactory.getLogger(ESresultUtil.class);
public static Long getLong(Map<String,Object> esMAp,String field){
Long valueLong = 0L;
if(esMAp!=null && esMAp.size()>0){
if(esMAp.containsKey(field)){
Object value = esMAp.get(field);
if(value!=null && StringUtils.isNotBlank(value.toString())){
valueLong = Long.valueOf(value.toString());
}
}
}
return valueLong;
}
}
UnicodeUtil.java
package com.hsiehchou.es.utils;
import java.util.regex.Pattern;
public class UnicodeUtil {
// 根据Unicode编码完美的判断中文汉字和符号
private static boolean isChinese(char c) {
Character.UnicodeBlock ub = Character.UnicodeBlock.of(c);
if (ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS || ub == Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS
|| ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A || ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B
|| ub == Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION || ub == Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS
|| ub == Character.UnicodeBlock.GENERAL_PUNCTUATION) {
return true;
}
return false;
}
// 完整的判断中文汉字和符号
public static boolean isChinese(String strName) {
char[] ch = strName.toCharArray();
for (int i = 0; i < ch.length; i++) {
char c = ch[i];
if (isChinese(c)) {
return true;
}
}
return false;
}
// 完整的判断中文汉字和符号
public static boolean hasChinese(String strName) {
char[] ch = strName.toCharArray();
for (int i = 0; i < ch.length; i++) {
char c = ch[i];
if (isChinese(c)) {
return true;
}
}
return false;
}
// 只能判断部分CJK字符(CJK统一汉字)
public static boolean isChineseByREG(String str) {
if (str == null) {
return false;
}
Pattern pattern = Pattern.compile("[\\u4E00-\\u9FBF]+");
return pattern.matcher(str.trim()).find();
}
// 只能判断部分CJK字符(CJK统一汉字)
/* public static boolean isChineseByName(String str) {
if (str == null) {
return false;
}
// 大小写不同:\\p 表示包含,\\P 表示不包含
// \\p{Cn} 的意思为 Unicode 中未被定义字符的编码,\\P{Cn} 就表示 Unicode中已经被定义字符的编码
String reg = "\\p{InCJK Unified Ideographs}&&\\P{Cn}";
Pattern pattern = Pattern.compile(reg);
return pattern.matcher(str.trim()).find();
}*/
public static void main(String[] args) {
System.out.println(hasChinese("aa表aa"));
}
}
7、V2
ElasticSearchService.java
package com.hsiehchou.es.V2;
import com.hsiehchou.es.client.ESClientUtils;
import org.apache.commons.collections.map.HashedMap;
import org.apache.commons.lang.StringUtils;
import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexResponse;
import org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsRequest;
import org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsResponse;
import org.elasticsearch.action.bulk.BulkRequestBuilder;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.aggregations.AggregationBuilder;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.terms.Terms;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.elasticsearch.search.sort.SortBuilder;
import org.elasticsearch.search.sort.SortOrder;
import java.util.*;
/**
* ES检索封装
*/
public class ElasticSearchService {
private final static int MAX = 10000;
private static TransportClient client = ESClientUtils.getClient();
/**
* 功能描述:新建索引
* @param indexName 索引名
*/
public void createIndex(String indexName) {
client.admin().indices().create(new CreateIndexRequest(indexName))
.actionGet();
}
/**
* 功能描述:新建索引
* @param index 索引名
* @param type 类型
*/
public void createIndex(String index, String type) {
client.prepareIndex(index, type).setSource().get();
}
/**
* 功能描述:删除索引
* @param index 索引名
*/
public void deleteIndex(String index) {
if (indexExist(index)) {
DeleteIndexResponse dResponse = client.admin().indices().prepareDelete(index)
.execute().actionGet();
if (!dResponse.isAcknowledged()) {
}
} else {
}
}
/**
* 功能描述:验证索引是否存在
* @param index 索引名
*/
public boolean indexExist(String index) {
IndicesExistsRequest inExistsRequest = new IndicesExistsRequest(index);
IndicesExistsResponse inExistsResponse = client.admin().indices()
.exists(inExistsRequest).actionGet();
return inExistsResponse.isExists();
}
/**
* 功能描述:插入数据
* @param index 索引名
* @param type 类型
* @param json 数据
*/
public void insertData(String index, String type, String json) {
client.prepareIndex(index, type)
.setSource(json)
.get();
}
/**
* 功能描述:插入数据
* @param index 索引名
* @param type 类型
* @param _id 数据id
* @param json 数据
*/
public void insertData(String index, String type, String _id, String json) {
client.prepareIndex(index, type).setId(_id)
.setSource(json)
.get();
}
/**
* 功能描述:更新数据
* @param index 索引名
* @param type 类型
* @param _id 数据id
* @param json 数据
*/
public void updateData(String index, String type, String _id, String json) throws Exception {
try {
UpdateRequest updateRequest = new UpdateRequest(index, type, _id)
.doc(json);
client.update(updateRequest).get();
} catch (Exception e) {
//throw new MessageException("update data failed.", e);
}
}
/**
* 功能描述:删除数据
* @param index 索引名
* @param type 类型
* @param _id 数据id
*/
public void deleteData(String index, String type, String _id) {
client.prepareDelete(index, type, _id)
.get();
}
/**
* 功能描述:批量插入数据
* @param index 索引名
* @param type 类型
* @param data (_id 主键, json 数据)
*/
public void bulkInsertData(String index, String type, Map<String, String> data) {
BulkRequestBuilder bulkRequest = client.prepareBulk();
data.forEach((param1, param2) -> {
bulkRequest.add(client.prepareIndex(index, type, param1)
.setSource(param2)
);
});
bulkRequest.get();
}
/**
* 功能描述:批量插入数据
* @param index 索引名
* @param type 类型
* @param jsonList 批量数据
*/
public void bulkInsertData(String index, String type, List<String> jsonList) {
BulkRequestBuilder bulkRequest = client.prepareBulk();
jsonList.forEach(item -> {
bulkRequest.add(client.prepareIndex(index, type)
.setSource(item)
);
});
bulkRequest.get();
}
/**
* 功能描述:查询
* @param index 索引名
* @param type 类型
* @param constructor 查询构造
*/
public List<Map<String, Object>> search(String index, String type, ESQueryBuilderConstructor constructor) {
List<Map<String, Object>> list = new ArrayList<>();
SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index).setTypes(type);
//排序
if (StringUtils.isNotEmpty(constructor.getAsc()))
searchRequestBuilder.addSort(constructor.getAsc(), SortOrder.ASC);
if (StringUtils.isNotEmpty(constructor.getDesc()))
searchRequestBuilder.addSort(constructor.getDesc(), SortOrder.DESC);
//设置查询体
searchRequestBuilder.setQuery(constructor.listBuilders());
//返回条目数
int size = constructor.getSize();
if (size < 0) {
size = 0;
}
if (size > MAX) {
size = MAX;
}
//返回条目数
searchRequestBuilder.setSize(size);
searchRequestBuilder.setFrom(constructor.getFrom() < 0 ? 0 : constructor.getFrom());
SearchResponse searchResponse = searchRequestBuilder.execute().actionGet();
SearchHits hits = searchResponse.getHits();
SearchHit[] searchHists = hits.getHits();
for (SearchHit sh : searchHists) {
list.add(sh.getSourceAsMap());
}
return list;
}
/**
* 功能描述:查询
* @param index 索引名
* @param type 类型
* @param constructor 查询构造
*/
public Map<String,Object> searchCountAndMessage(String index, String type, ESQueryBuilderConstructor constructor) {
Map<String,Object> map = new HashMap<String,Object>();
List<Map<String, Object>> list = new ArrayList<>();
SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index).setTypes(type);
//排序
if (StringUtils.isNotEmpty(constructor.getAsc()))
searchRequestBuilder.addSort(constructor.getAsc(), SortOrder.ASC);
if (StringUtils.isNotEmpty(constructor.getDesc()))
searchRequestBuilder.addSort(constructor.getDesc(), SortOrder.DESC);
//设置查询体
searchRequestBuilder.setQuery(constructor.listBuilders());
//返回条目数
int size = constructor.getSize();
if (size < 0) {
size = 0;
}
if (size > MAX) {
size = MAX;
}
//返回条目数
searchRequestBuilder.setSize(size);
searchRequestBuilder.setFrom(constructor.getFrom() < 0 ? 0 : constructor.getFrom());
SearchResponse searchResponse = searchRequestBuilder.execute().actionGet();
long totalHits = searchResponse.getHits().getTotalHits();
SearchHits hits = searchResponse.getHits();
SearchHit[] searchHists = hits.getHits();
for (SearchHit sh : searchHists) {
list.add(sh.getSourceAsMap());
}
map.put("total",(long)searchHists.length);
map.put("count",totalHits);
map.put("data",list);
return map;
}
/**
* 功能描述:查询
* @param index 索引名
* @param type 类型
* @param constructor 查询构造
*/
public Map<String,Object> searchCountAndMessageNew(String index, String type, ESQueryBuilderConstructorNew constructor) {
Map<String,Object> map = new HashMap<String,Object>();
List<Map<String, Object>> list = new ArrayList<>();
SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index).setTypes(type);
//排序
List<SortBuilder> sortBuilderList = constructor.getSortBuilderList();
if(sortBuilderList!=null && sortBuilderList.size()>0){
sortBuilderList.forEach(sortBuilder->{
searchRequestBuilder.addSort(sortBuilder);
});
}
//设置查询体
searchRequestBuilder.setQuery(constructor.listBuilders());
//返回条目数
int size = constructor.getSize();
if (size < 0) {
size = 0;
}
if (size > MAX) {
size = MAX;
}
//返回条目数
searchRequestBuilder.setSize(size);
searchRequestBuilder.setFrom(constructor.getFrom() < 0 ? 0 : constructor.getFrom());
//设置高亮
HighlightBuilder highlightBuilder = new HighlightBuilder();
List<String> highLighterFields = constructor.getHighLighterFields();
if(highLighterFields.size()>0){
highLighterFields.forEach(field -> {
highlightBuilder.field(field);
});
}
highlightBuilder.preTags("<font color=\"red\">");
highlightBuilder.postTags("</font>");
SearchResponse searchResponse = searchRequestBuilder.highlighter(highlightBuilder).execute().actionGet();
long totalHits = searchResponse.getHits().getTotalHits();
SearchHits hits = searchResponse.getHits();
SearchHit[] searchHists = hits.getHits();
for (SearchHit hit : searchHists) {
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
Map<String, HighlightField> highlightFields = hit.getHighlightFields();
//获取高亮结果
Set<String> set = highlightFields.keySet();
for (String str : set) {
Text[] fragments = highlightFields.get(str).getFragments();
String st1r="";
for(Text text:fragments){
st1r = st1r + text.toString();
}
sourceAsMap.put(str,st1r);
System.out.println("str(==============" + st1r);
}
list.add(sourceAsMap);
}
map.put("total",(long)searchHists.length);
map.put("count",totalHits);
map.put("data",list);
return map;
}
/**
* 功能描述:统计查询
* @param index 索引名
* @param type 类型
* @param constructor 查询构造
* @param groupBy 统计字段
*/
public Map<Object, Object> statSearch(String index, String type, ESQueryBuilderConstructor constructor, String groupBy) {
Map<Object, Object> map = new HashedMap();
SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index).setTypes(type);
//排序
if (StringUtils.isNotEmpty(constructor.getAsc()))
searchRequestBuilder.addSort(constructor.getAsc(), SortOrder.ASC);
if (StringUtils.isNotEmpty(constructor.getDesc()))
searchRequestBuilder.addSort(constructor.getDesc(), SortOrder.DESC);
//设置查询体
if (null != constructor) {
searchRequestBuilder.setQuery(constructor.listBuilders());
} else {
searchRequestBuilder.setQuery(QueryBuilders.matchAllQuery());
}
int size = constructor.getSize();
if (size < 0) {
size = 0;
}
if (size > MAX) {
size = MAX;
}
//返回条目数
searchRequestBuilder.setSize(size);
searchRequestBuilder.setFrom(constructor.getFrom() < 0 ? 0 : constructor.getFrom());
SearchResponse sr = searchRequestBuilder.addAggregation(
AggregationBuilders.terms("agg").field(groupBy)
).get();
Terms stateAgg = sr.getAggregations().get("agg");
Iterator<? extends Terms.Bucket> iter = stateAgg.getBuckets().iterator();
while (iter.hasNext()) {
Terms.Bucket gradeBucket = iter.next();
map.put(gradeBucket.getKey(), gradeBucket.getDocCount());
}
return map;
}
/**
* 功能描述:统计查询
* @param index 索引名
* @param type 类型
* @param constructor 查询构造
* @param agg 自定义计算
*/
public Map<Object, Object> statSearch(String index, String type, ESQueryBuilderConstructor constructor, AggregationBuilder agg) {
if (agg == null) {
return null;
}
Map<Object, Object> map = new HashedMap();
SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index).setTypes(type);
//排序
if (StringUtils.isNotEmpty(constructor.getAsc()))
searchRequestBuilder.addSort(constructor.getAsc(), SortOrder.ASC);
if (StringUtils.isNotEmpty(constructor.getDesc()))
searchRequestBuilder.addSort(constructor.getDesc(), SortOrder.DESC);
//设置查询体
if (null != constructor) {
searchRequestBuilder.setQuery(constructor.listBuilders());
} else {
searchRequestBuilder.setQuery(QueryBuilders.matchAllQuery());
}
int size = constructor.getSize();
if (size < 0) {
size = 0;
}
if (size > MAX) {
size = MAX;
}
//返回条目数
searchRequestBuilder.setSize(size);
searchRequestBuilder.setFrom(constructor.getFrom() < 0 ? 0 : constructor.getFrom());
SearchResponse sr = searchRequestBuilder.addAggregation(
agg
).get();
Terms stateAgg = sr.getAggregations().get("agg");
Iterator<? extends Terms.Bucket> iter = stateAgg.getBuckets().iterator();
while (iter.hasNext()) {
Terms.Bucket gradeBucket = iter.next();
map.put(gradeBucket.getKey(), gradeBucket.getDocCount());
}
return map;
}
/**
* 功能描述:关闭链接
*/
public void close() {
client.close();
}
public static void test() {
try{
ElasticSearchService service = new ElasticSearchService();
ESQueryBuilderConstructorNew constructor = new ESQueryBuilderConstructorNew();
constructor.must(new ESQueryBuilders().bool(QueryBuilders.boolQuery()));
constructor.must(new ESQueryBuilders().match("secondlanguage", "4"));
constructor.must(new ESQueryBuilders().match("secondlanguage", "4"));
constructor.should(new ESQueryBuilders().match("source", "5"));
constructor.should(new ESQueryBuilders().match("source", "5"));
service.searchCountAndMessageNew("", "", constructor);
}catch (Exception e){
e.printStackTrace();
}
}
public static void main(String[] args) {
try {
ElasticSearchService service = new ElasticSearchService();
ESQueryBuilderConstructor constructor = new ESQueryBuilderConstructor();
/* constructor.must(new ESQueryBuilders().term("gender", "f").range("age", 20, 50));
constructor.should(new ESQueryBuilders().term("gender", "f").range("age", 20, 50).fuzzy("age", 20));
constructor.mustNot(new ESQueryBuilders().term("gender", "m"));
constructor.setSize(15); //查询返回条数,最大 10000
constructor.setFrom(11); //分页查询条目起始位置, 默认0
constructor.setAsc("age"); //排序
List<Map<String, Object>> list = service.search("bank", "account", constructor);
Map<Object, Object> map = service.statSearch("bank", "account", constructor, "state");*/
constructor.must(new ESQueryBuilders().match("id", "WE16000190TR"));
List<Map<String, Object>> list = service.search("test01", "test01", constructor);
for(Map<String, Object> map : list){
System.out.println(map);
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
ESCriterion.java
package com.hsiehchou.es.V2;
import org.elasticsearch.index.query.QueryBuilder;
import java.util.List;
/**
* 条件接口
*/
public interface ESCriterion {
public enum Operator {
PREFIX, /**根据字段前缀查询**/
MATCH, /**匹配查询**/
MATCH_PHRASE, /**精确匹配**/
MULTI_MATCH, /**多字段匹配**/
TERM, /**term查询**/
TERMS, /**term查询**/
RANGE, /**范围查询**/
GTE, /**大于等于查询**/
LTE,
FUZZY, /**根据字段前缀查询**/
QUERY_STRING, /**根据字段前缀查询**/
MISSING , /**根据字段前缀查询**/
BOOL
}
public enum MatchMode {
START, END, ANYWHERE
}
public enum Projection {
MAX, MIN, AVG, LENGTH, SUM, COUNT
}
public List<QueryBuilder> listBuilders();
}
ESQueryBuilderConstructor.java
package com.hsiehchou.es.V2;
import org.apache.commons.collections.CollectionUtils;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import java.util.ArrayList;
import java.util.List;
/**
* 查询条件容器
*/
public class ESQueryBuilderConstructor {
private int size = Integer.MAX_VALUE;
private int from = 0;
private String asc;
private String desc;
//查询条件容器
private List<ESCriterion> mustCriterions = new ArrayList<ESCriterion>();
private List<ESCriterion> shouldCriterions = new ArrayList<ESCriterion>();
private List<ESCriterion> mustNotCriterions = new ArrayList<ESCriterion>();
//构造builder
public QueryBuilder listBuilders() {
int count = mustCriterions.size() + shouldCriterions.size() + mustNotCriterions.size();
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
QueryBuilder queryBuilder = null;
if (count >= 1) {
//must容器
if (!CollectionUtils.isEmpty(mustCriterions)) {
for (ESCriterion criterion : mustCriterions) {
for (QueryBuilder builder : criterion.listBuilders()) {
queryBuilder = boolQueryBuilder.must(builder);
}
}
}
//should容器
if (!CollectionUtils.isEmpty(shouldCriterions)) {
for (ESCriterion criterion : shouldCriterions) {
for (QueryBuilder builder : criterion.listBuilders()) {
queryBuilder = boolQueryBuilder.should(builder);
}
}
}
//must not 容器
if (!CollectionUtils.isEmpty(mustNotCriterions)) {
for (ESCriterion criterion : mustNotCriterions) {
for (QueryBuilder builder : criterion.listBuilders()) {
queryBuilder = boolQueryBuilder.mustNot(builder);
}
}
}
return queryBuilder;
} else {
return null;
}
}
/**
* 增加简单条件表达式
*/
public ESQueryBuilderConstructor must(ESCriterion criterion){
if(criterion!=null){
mustCriterions.add(criterion);
}
return this;
}
/**
* 增加简单条件表达式
*/
public ESQueryBuilderConstructor should(ESCriterion criterion){
if(criterion!=null){
shouldCriterions.add(criterion);
}
return this;
}
/**
* 增加简单条件表达式
*/
public ESQueryBuilderConstructor mustNot(ESCriterion criterion){
if(criterion!=null){
mustNotCriterions.add(criterion);
}
return this;
}
public int getSize() {
return size;
}
public void setSize(int size) {
this.size = size;
}
public String getAsc() {
return asc;
}
public void setAsc(String asc) {
this.asc = asc;
}
public String getDesc() {
return desc;
}
public void setDesc(String desc) {
this.desc = desc;
}
public int getFrom() {
return from;
}
public void setFrom(int from) {
this.from = from;
}
}
ESQueryBuilderConstructorNew.java
package com.hsiehchou.es.V2;
import org.apache.commons.collections.CollectionUtils;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.sort.SortBuilder;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
/**
* 查询条件容器
*/
public class ESQueryBuilderConstructorNew {
private List<String> highLighterFields = new ArrayList<String>();
private int size = Integer.MAX_VALUE;
private int from = 0;
private List<SortBuilder> sortBuilderList;
public List<SortBuilder> getSortBuilderList() {
return sortBuilderList;
}
public void setSortBuilderList(List<SortBuilder> sortBuilderList) {
this.sortBuilderList = sortBuilderList;
}
private Map<String,List<String>> sortMap;
//查询条件容器
private List<ESCriterion> mustCriterions = new ArrayList<ESCriterion>();
private List<ESCriterion> shouldCriterions = new ArrayList<ESCriterion>();
private List<ESCriterion> mustNotCriterions = new ArrayList<ESCriterion>();
//构造builder
public QueryBuilder listBuilders() {
int count = mustCriterions.size() + shouldCriterions.size() + mustNotCriterions.size();
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
QueryBuilder queryBuilder = null;
if (count >= 1) {
//must容器
if (!CollectionUtils.isEmpty(mustCriterions)) {
for (ESCriterion criterion : mustCriterions) {
for (QueryBuilder builder : criterion.listBuilders()) {
queryBuilder = boolQueryBuilder.must(builder);
}
}
}
//should容器
if (!CollectionUtils.isEmpty(shouldCriterions)) {
for (ESCriterion criterion : shouldCriterions) {
for (QueryBuilder builder : criterion.listBuilders()) {
queryBuilder = boolQueryBuilder.should(builder);
}
}
}
//must not 容器
if (!CollectionUtils.isEmpty(mustNotCriterions)) {
for (ESCriterion criterion : mustNotCriterions) {
for (QueryBuilder builder : criterion.listBuilders()) {
queryBuilder = boolQueryBuilder.mustNot(builder);
}
}
}
return queryBuilder;
} else {
return null;
}
}
/**
* 增加简单条件表达式
*/
public ESQueryBuilderConstructorNew must(ESCriterion criterion){
if(criterion!=null){
mustCriterions.add(criterion);
}
return this;
}
/**
* 增加简单条件表达式
*/
public ESQueryBuilderConstructorNew should(ESCriterion criterion){
if(criterion!=null){
shouldCriterions.add(criterion);
}
return this;
}
/**
* 增加简单条件表达式
*/
public ESQueryBuilderConstructorNew mustNot(ESCriterion criterion){
if(criterion!=null){
mustNotCriterions.add(criterion);
}
return this;
}
public List<String> getHighLighterFields() {
return highLighterFields;
}
public void setHighLighterFields(List<String> highLighterFields) {
this.highLighterFields = highLighterFields;
}
public int getSize() {
return size;
}
public void setSize(int size) {
this.size = size;
}
public Map<String, List<String>> getSortMap() {
return sortMap;
}
public void setSortMap(Map<String, List<String>> sortMap) {
this.sortMap = sortMap;
}
public int getFrom() {
return from;
}
public void setFrom(int from) {
this.from = from;
}
}
ESQueryBuilders.java
package com.hsiehchou.es.V2;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.NestedQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
/**
* 条件构造器
*/
public class ESQueryBuilders implements ESCriterion{
private List<QueryBuilder> list = new ArrayList<QueryBuilder>();
/**
* 功能描述:match 查询
* @param field 字段名
* @param value 值
*/
public ESQueryBuilders match(String field, Object value) {
list.add(new ESSimpleExpression (field, value, Operator.MATCH).toBuilder());
return this;
}
/**
* 功能描述:match 查询
* @param field 字段名
* @param value 值
*/
public ESQueryBuilders match_phrase(String field, Object value) {
list.add(new ESSimpleExpression (field, value, Operator.MATCH_PHRASE).toBuilder());
return this;
}
/**
* 功能描述:match 查询
* @param fieldNames 字段名
* @param value 值
*/
public ESQueryBuilders multi_match(Object value , String... fieldNames ) {
String[] fields = fieldNames;
list.add(new ESSimpleExpression (value, Operator.MULTI_MATCH,fields).toBuilder());
return this;
}
/**
* 功能描述:Term 查询
* @param field 字段名
* @param value 值
*/
public ESQueryBuilders term(String field, Object value) {
list.add(new ESSimpleExpression (field, value, Operator.TERM).toBuilder());
return this;
}
/**
* 功能描述:Terms 查询
* @param field 字段名
* @param values 集合值
*/
public ESQueryBuilders terms(String field, Collection<Object> values) {
list.add(new ESSimpleExpression (field, values).toBuilder());
return this;
}
/**
* 功能描述:fuzzy 查询
* @param field 字段名
* @param value 值
*/
public ESQueryBuilders fuzzy(String field, Object value) {
list.add(new ESSimpleExpression (field, value, Operator.FUZZY).toBuilder());
return this;
}
/**
* 功能描述:Range 查询
* @param from 起始值
* @param to 末尾值
*/
public ESQueryBuilders range(String field, Object from, Object to) {
list.add(new ESSimpleExpression (field, from, to).toBuilder());
return this;
}
/**
* 功能描述:GTE 大于等于查询
* @param
*/
public ESQueryBuilders gte(String field, Object num) {
list.add(new ESSimpleExpression (field, num,Operator.GTE).toBuilder());
return this;
}
/**
* 功能描述:LTE 小于等于查询
* @param
*/
public ESQueryBuilders lte(String field, Object num) {
list.add(new ESSimpleExpression (field, num,Operator.LTE).toBuilder());
return this;
}
/**
* 功能描述:prefix 查询
* @param field 字段名
* @param value 值
*/
public ESQueryBuilders prefix(String field, Object value) {
list.add(new ESSimpleExpression (field, value, Operator.PREFIX).toBuilder());
return this;
}
/**
* 功能描述:Range 查询
* @param queryString 查询语句
*/
public ESQueryBuilders queryString(String queryString) {
list.add(new ESSimpleExpression (queryString, Operator.QUERY_STRING).toBuilder());
return this;
}
/**
* 功能描述:Range 查询
* @param
*/
public ESQueryBuilders bool(BoolQueryBuilder boolQueryBuilder) {
list.add(boolQueryBuilder);
return this;
}
public ESQueryBuilders nested(NestedQueryBuilder nestedQueryBuilder) {
list.add(nestedQueryBuilder);
return this;
}
public List<QueryBuilder> listBuilders() {
return list;
}
}
ESSimpleExpression.java
package com.hsiehchou.es.V2;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import java.util.Collection;
import com.hsiehchou.es.V2.ESCriterion.Operator;
import static org.elasticsearch.index.search.MatchQuery.Type.PHRASE;
/**
* 条件表达式
*/
public class ESSimpleExpression {
private String[] fieldNames; //属性名
private String fieldName; //属性名
private Object value; //对应值
private Collection<Object> values;//对应值
private Operator operator; //计算符
private Object from;
private Object to;
protected ESSimpleExpression() {
}
protected ESSimpleExpression(Object value, Operator operator,String... fieldNames) {
this.fieldNames = fieldNames;
this.value = value;
this.operator = operator;
}
protected ESSimpleExpression(String fieldName, Object value, Operator operator) {
this.fieldName = fieldName;
this.value = value;
this.operator = operator;
}
protected ESSimpleExpression(String value, Operator operator) {
this.value = value;
this.operator = operator;
}
protected ESSimpleExpression(String fieldName, Collection<Object> values) {
this.fieldName = fieldName;
this.values = values;
this.operator = Operator.TERMS;
}
protected ESSimpleExpression(String fieldName, Object from, Object to) {
this.fieldName = fieldName;
this.from = from;
this.to = to;
this.operator = Operator.RANGE;
}
public BoolQueryBuilder toBoolQueryBuilder(){
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.mustNot(QueryBuilders.matchQuery("",""));
boolQueryBuilder.mustNot(QueryBuilders.matchQuery("",""));
return null;
}
public QueryBuilder toBuilder() {
QueryBuilder qb = null;
switch (operator) {
case MATCH:
qb = QueryBuilders.matchQuery(fieldName, value);
break;
case MATCH_PHRASE:
qb = QueryBuilders.matchPhraseQuery(fieldName, value);
break;
case MULTI_MATCH:
qb = QueryBuilders.multiMatchQuery(value,fieldNames).type(PHRASE);
break;
case TERM:
qb = QueryBuilders.termQuery(fieldName, value);
break;
case TERMS:
qb = QueryBuilders.termsQuery(fieldName, values);
break;
case RANGE:
qb = QueryBuilders.rangeQuery(fieldName).from(from).to(to).includeLower(true).includeUpper(true);
break;
case GTE:
qb = QueryBuilders.rangeQuery(fieldName).gte(value);
break;
case LTE:
qb = QueryBuilders.rangeQuery(fieldName).lte(value);
break;
case FUZZY:
qb = QueryBuilders.fuzzyQuery(fieldName, value);
break;
case PREFIX:
qb = QueryBuilders.prefixQuery(fieldName, value.toString());
break;
case QUERY_STRING:
qb = QueryBuilders.queryStringQuery(value.toString());
default:
}
return qb;
}
}
九、预警
通过后台或者界面设置规则,保存到mysql,然后同步到redis。
数据量大的话,用mysql是非常慢的,使用内存数据库redis进行规则缓存,使用时直接比对预警。
MySQL 需要2张表
一张是规则表 用来存储规则
一张是消息表 存储告警消息
1、创建规则表(由界面控制规则发布)
规则首先存放在mysql中,会使用一个定时任务将mysql中的规则同步到redis
直接在test库中创建
创建脚本
xz_rule.sql
SET FOREIGN_KEY_CHECKS=0;
DROP TABLE IF EXISTS `xz_rule`;
CREATE TABLE `xz_rule` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`warn_fieldname` varchar(20) DEFAULT NULL,
`warn_fieldvalue` varchar(255) DEFAULT NULL,
`publisher` varchar(255) DEFAULT NULL,
`send_type` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`send_mobile` varchar(255) DEFAULT NULL,
`send_mail` varchar(255) DEFAULT NULL,
`send_dingding` varchar(255) DEFAULT NULL,
`create_time` date DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=latin1;
INSERT INTO `xz_rule` VALUES ('1', 'phone', '18609765432', '?????1', '2', '13724536789', '1782324@qq.com', '32143243', '2019-06-28');
2、创建消息表
- 用于存放预警的消息,供界面定时刷新预警消息 或者是滚屏预警
- 预警消息统计
warn_message.sql
SET FOREIGN_KEY_CHECKS=0;
DROP TABLE IF EXISTS `warn_message`;
CREATE TABLE `warn_message` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`alarmRuleid` varchar(255) DEFAULT NULL,
`alarmType` varchar(255) DEFAULT NULL,
`sendType` varchar(255) DEFAULT NULL,
`sendMobile` varchar(255) DEFAULT NULL,
`sendEmail` varchar(255) DEFAULT NULL,
`sendStatus` varchar(255) DEFAULT NULL,
`senfInfo` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`hitTime` datetime DEFAULT NULL,
`checkinTime` datetime DEFAULT NULL,
`isRead` varchar(255) DEFAULT NULL,
`readAccounts` varchar(255) DEFAULT NULL,
`alarmaccounts` varchar(255) DEFAULT NULL,
`accountid` varchar(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=31 DEFAULT CHARSET=latin1;
3、创建数据库连接工具类
新建com.hsiehchou.common.netb.db包
创建DBCommon类
DBCommon.java
package com.hsiehchou.common.netb.db;
import com.hsiehchou.common.config.ConfigUtil;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.sql.*;
import java.util.Properties;
public class DBCommon {
private static Logger LOG = LoggerFactory.getLogger(DBCommon.class);
private static String MYSQL_PATH = "common/mysql.properties";
private static Properties properties = ConfigUtil.getInstance().getProperties(MYSQL_PATH);
private static Connection conn ;
private DBCommon(){}
public static void main(String[] args) {
System.out.println(properties);
Connection xz_bigdata = DBCommon.getConn("test");
System.out.println(xz_bigdata);
}
//TODO 配置文件
private static final String JDBC_DRIVER = "com.mysql.jdbc.Driver";
private static final String USER_NAME = properties.getProperty("user");
private static final String PASSWORD = properties.getProperty("password");
private static final String IP = properties.getProperty("db_ip");
private static final String PORT = properties.getProperty("db_port");
private static final String DB_CONFIG = "?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull&autoReconnect=true&failOverReadOnly=false";
static {
try {
Class.forName(JDBC_DRIVER);
} catch (ClassNotFoundException e) {
LOG.error(null, e);
}
}
/**
* 获取数据库连接
* @param dbName
* @return
*/
public static Connection getConn(String dbName) {
Connection conn = null;
String connstring = "jdbc:mysql://"+IP+":"+PORT+"/"+dbName+DB_CONFIG;
try {
conn = DriverManager.getConnection(connstring, USER_NAME, PASSWORD);
} catch (SQLException e) {
e.printStackTrace();
LOG.error(null, e);
}
return conn;
}
/**
* @param url eg:"jdbc:oracle:thin:@172.16.1.111:1521:d406"
* @param driver eg:"oracle.jdbc.driver.OracleDriver"
* @param user eg:"ucase"
* @param password eg:"ucase123"
* @return
* @throws ClassNotFoundException
* @throws SQLException
*/
public static Connection getConn(String url, String driver, String user,
String password) throws ClassNotFoundException, SQLException{
Class.forName(driver);
conn = DriverManager.getConnection(url, user, password);
return conn;
}
public static void close(Connection conn){
try {
if( conn != null ){
conn.close();
}
} catch (SQLException e) {
LOG.error(null,e);
}
}
public static void close(Statement statement){
try {
if( statement != null ){
statement.close();
}
} catch (SQLException e) {
LOG.error(null,e);
}
}
public static void close(Connection conn,PreparedStatement statement){
try {
if( conn != null ){
conn.close();
}
if( statement != null ){
statement.close();
}
} catch (SQLException e) {
LOG.error(null,e);
}
}
public static void close(Connection conn,Statement statement,ResultSet resultSet) throws SQLException{
if( resultSet != null ){
resultSet.close();
}
if( statement != null ){
statement.close();
}
if( conn != null ){
conn.close();
}
}
}
引入maven依赖
<dependency>
<groupId>commons-dbutils</groupId>
<artifactId>commons-dbutils</artifactId>
<version>${commons-dbutils.version}</version>
</dependency>
4、创建实体类和dao
新建com.hsiehchou.spark.warn.domain包
新建 XZ_RuleDomain,WarningMessage
XZ_RuleDomain.java
package com.hsiehchou.spark.warn.domain;
import java.sql.Date;
public class XZ_RuleDomain {
private int id;
private String warn_fieldname; //预警字段
private String warn_fieldvalue; //预警内容
private String publisher; //发布者
private String send_type; //消息接收方式
private String send_mobile; //接收手机号
private String send_mail; //接收邮箱
private String send_dingding; //接收钉钉
private Date create_time; //创建时间
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
public String getWarn_fieldname() {
return warn_fieldname;
}
public void setWarn_fieldname(String warn_fieldname) {
this.warn_fieldname = warn_fieldname;
}
public String getWarn_fieldvalue() {
return warn_fieldvalue;
}
public void setWarn_fieldvalue(String warn_fieldvalue) {
this.warn_fieldvalue = warn_fieldvalue;
}
public String getPublisher() {
return publisher;
}
public void setPublisher(String publisher) {
this.publisher = publisher;
}
public String getSend_type() {
return send_type;
}
public void setSend_type(String send_type) {
this.send_type = send_type;
}
public String getSend_mobile() {
return send_mobile;
}
public void setSend_mobile(String send_mobile) {
this.send_mobile = send_mobile;
}
public String getSend_mail() {
return send_mail;
}
public void setSend_mail(String send_mail) {
this.send_mail = send_mail;
}
public String getSend_dingding() {
return send_dingding;
}
public void setSend_dingding(String send_dingding) {
this.send_dingding = send_dingding;
}
public Date getCreate_time() {
return create_time;
}
public void setCreate_time(Date create_time) {
this.create_time = create_time;
}
}
WarningMessage.java
package com.hsiehchou.spark.warn.domain;
import java.sql.Date;
public class WarningMessage {
private String id; //主键id
private String alarmRuleid; //规则id
private String alarmType; //告警类型
private String sendType; //发送方式
private String sendMobile; //发送至手机
private String sendEmail; //发送至邮箱
private String sendStatus; //发送状态
private String senfInfo; //发送内容
private Date hitTime; //命中时间
private Date checkinTime; //入库时间
private String isRead; //是否已读
private String readAccounts; //已读用户
private String alarmaccounts;
private String accountid;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getAlarmRuleid() {
return alarmRuleid;
}
public void setAlarmRuleid(String alarmRuleid) {
this.alarmRuleid = alarmRuleid;
}
public String getAlarmType() {
return alarmType;
}
public void setAlarmType(String alarmType) {
this.alarmType = alarmType;
}
public String getSendType() {
return sendType;
}
public void setSendType(String sendType) {
this.sendType = sendType;
}
public String getSendMobile() {
return sendMobile;
}
public void setSendMobile(String sendMobile) {
this.sendMobile = sendMobile;
}
public String getSendEmail() {
return sendEmail;
}
public void setSendEmail(String sendEmail) {
this.sendEmail = sendEmail;
}
public String getSendStatus() {
return sendStatus;
}
public void setSendStatus(String sendStatus) {
this.sendStatus = sendStatus;
}
public String getSenfInfo() {
return senfInfo;
}
public void setSenfInfo(String senfInfo) {
this.senfInfo = senfInfo;
}
public Date getHitTime() {
return hitTime;
}
public void setHitTime(Date hitTime) {
this.hitTime = hitTime;
}
public Date getCheckinTime() {
return checkinTime;
}
public void setCheckinTime(Date checkinTime) {
this.checkinTime = checkinTime;
}
public String getIsRead() {
return isRead;
}
public void setIsRead(String isRead) {
this.isRead = isRead;
}
public String getReadAccounts() {
return readAccounts;
}
public void setReadAccounts(String readAccounts) {
this.readAccounts = readAccounts;
}
public String getAlarmaccounts() {
return alarmaccounts;
}
public void setAlarmaccounts(String alarmaccounts) {
this.alarmaccounts = alarmaccounts;
}
public String getAccountid() {
return accountid;
}
public void setAccountid(String accountid) {
this.accountid = accountid;
}
@Override
public String toString() {
return "WarningMessage{" +
"id='" + id + '\'' +
", alarmRuleid='" + alarmRuleid + '\'' +
", alarmType='" + alarmType + '\'' +
", sendType='" + sendType + '\'' +
", sendMobile='" + sendMobile + '\'' +
", sendEmail='" + sendEmail + '\'' +
", sendStatus='" + sendStatus + '\'' +
", senfInfo='" + senfInfo + '\'' +
", hitTime=" + hitTime +
", checkinTime=" + checkinTime +
", isRead='" + isRead + '\'' +
", readAccounts='" + readAccounts + '\'' +
", alarmaccounts='" + alarmaccounts + '\'' +
", accountid='" + accountid + '\'' +
'}';
}
}
新建com.hsiehchou.spark.warn.dao包
新建 XZ_RuleDao,WarningMessageDao
XZ_RuleDao.java
package com.hsiehchou.spark.warn.dao;
import com.hsiehchou.common.netb.db.DBCommon;
import com.hsiehchou.spark.warn.domain.XZ_RuleDomain;
import org.apache.commons.dbutils.QueryRunner;
import org.apache.commons.dbutils.handlers.BeanListHandler;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.sql.Connection;
import java.sql.SQLException;
import java.util.List;
public class XZ_RuleDao {
private static final Logger LOG = LoggerFactory.getLogger(XZ_RuleDao.class);
/**
* 获取所有的规则
* @return
*/
public static List<XZ_RuleDomain> getRuleList(){
List<XZ_RuleDomain> listRules = null;
//获取连接
Connection conn = DBCommon.getConn("test");
//执行器
QueryRunner query = new QueryRunner();
String sql = "select * from xz_rule";
try {
listRules = query.query(conn,sql,new BeanListHandler<>(XZ_RuleDomain.class));
} catch (SQLException e) {
LOG.error(null,e);
}finally {
DBCommon.close(conn);
}
return listRules;
}
public static void main(String[] args) {
List<XZ_RuleDomain> ruleList = XZ_RuleDao.getRuleList();
System.out.println(ruleList.size());
ruleList.forEach(x->{
System.out.println(x);
});
}
}
WarningMessageDao.java
package com.hsiehchou.spark.warn.dao;
import com.hsiehchou.common.netb.db.DBCommon;
import com.hsiehchou.spark.warn.domain.WarningMessage;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.sql.*;
public class WarningMessageDao {
private static final Logger LOG = LoggerFactory.getLogger(WarningMessageDao.class);
/**
* 写入消息到mysql
* @param warningMessage
* @return
*/
public static Integer insertWarningMessageReturnId(WarningMessage warningMessage) {
Connection conn= DBCommon.getConn("test");
String sql="insert into warn_message(alarmruleid,sendtype,senfinfo,hittime,sendmobile,alarmtype) " +
"values(?,?,?,?,?,?)";
PreparedStatement stmt=null;
ResultSet resultSet=null;
int id=-1;
try{
stmt = conn.prepareStatement(sql);
stmt.setString(1,warningMessage.getAlarmRuleid());
stmt.setInt(2,Integer.valueOf(warningMessage.getSendType()));
stmt.setString(3,warningMessage.getSenfInfo());
stmt.setTimestamp(4,new Timestamp(System.currentTimeMillis()));
stmt.setString(5,warningMessage.getSendMobile());
stmt.setInt(6,Integer.valueOf(warningMessage.getAlarmType()));
stmt.executeUpdate();
}catch(Exception e) {
LOG.error(null,e);
}finally {
try {
DBCommon.close(conn,stmt,resultSet);
} catch (SQLException e) {
e.printStackTrace();
}
}
return id;
}
}
5、告警工具类
新建com.hsiehchou.spark.warn.service包
新建 BlackRuleWarning,WarningMessageSendUtil
BlackRuleWarning.java
package com.hsiehchou.spark.warn.service;
import com.hsiehchou.spark.warn.dao.WarningMessageDao;
import com.hsiehchou.spark.warn.domain.WarningMessage;
import org.apache.commons.lang3.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import redis.clients.jedis.Jedis;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
public class BlackRuleWarning {
private static final Logger LOG = LoggerFactory.getLogger(BlackRuleWarning.class);
//可以通过数据库,配置文件加载
//为了遍历所有预警字段
private static List<String> listWarnFields = new ArrayList<>();
static {
listWarnFields.add("phone");
listWarnFields.add("mac");
}
/**
* 预警流程处理
* @param map
* @param jedis15
*/
public static void blackWarning(Map<String, Object> map, Jedis jedis15) {
listWarnFields.forEach(warnField -> {
if (map.containsKey(warnField) && StringUtils.isNotBlank(map.get(warnField).toString())) {
//获取预警字段核预警值 相当于手机号
String warnFieldValue = map.get(warnField).toString();
//去redis中进行比对
//数据中 通过 "字段" + "字段值" 去拼接key
// phone : 186XXXXXX
String key = warnField + ":" + warnFieldValue;
//redis中的key是 phone:18609765435
System.out.println("拼接数据流中的key=======" + key);
if (jedis15.exists(key)) {
//对比命中之后 就可以发送消息提醒
System.out.println("命中REDIS中的" + key + "===========开始预警");
beginWarning(jedis15, key);
} else {
//直接过
System.out.println("未命中" + key + "===========不进行预警");
}
}
});
}
/**
* 规则已经命中,开始预警
* @param jedis15
* @param key
*/
private static void beginWarning( Jedis jedis15, String key) {
System.out.println("============MESSAGE -1- =========");
//封装告警 信息及告警消息
WarningMessage warningMessage = getWarningMessage(jedis15, key);
System.out.println("============MESSAGE -4- =========");
if (warningMessage != null) {
//将预警信息写入预警信息表
WarningMessageDao.insertWarningMessageReturnId(warningMessage);
//String accountid = warningMessage.getAccountid();
//String readAccounts = warningMessage.getAlarmaccounts();
// WarnService.insertRead_status(messageId, accountid);
if (warningMessage.getSendType().equals("2")) {
//手机短信告警 默认告警方式
WarningMessageSendUtil.messageWarn(warningMessage);
}
}
}
/**
* 封装告警信息及告警消息
* @param jedis15
* @param key
* @return
*/
private static WarningMessage getWarningMessage(Jedis jedis15, String key) {
System.out.println("============MESSAGE -2- =========");
//封装消息
String[] split = key.split(":");
if (split.length == 2) {
WarningMessage warningMessage = new WarningMessage();
String time = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").toString();
String clew_type = split[0];//告警字段
String rulecontent = split[1];//告警字段值
//从redis中获取消息信息进行封装
Map<String, String> valueMap = jedis15.hgetAll(key);
//规则ID (是哪条规则命中的)
warningMessage.setAlarmRuleid(valueMap.get("id"));
//预警方式
warningMessage.setSendType(valueMap.get("send_type"));//告警方式,0:界面 1:邮件 2:短信 3:邮件+短信
//预警信息接收手机号
warningMessage.setSendMobile(valueMap.get("send_mobile"));
//arningMessage.setSendEmail(valueMap.get("sendemail"));
/*arningMessage.setAlarmaccounts(valueMap.get("alarmaccounts"));*/
//规则发布人
warningMessage.setAccountid(valueMap.get("publisher"));
warningMessage.setAlarmType("2");
StringBuffer warn_content = new StringBuffer();
//预警内容 信息 时间 地点 人物
//预警字段来进行设置 phone
//我们有手机号
//数据关联
// 手机 MAC 身份证, 车牌 人脸。。URL 姓名
// 全部设在推送消息里面
warn_content.append("【网络告警】:手机号为:" + "[" + rulecontent + "]在时间" + time + "出现在" + ">附近,设备号"
);
String content = warn_content.toString();
warningMessage.setSenfInfo(content);
System.out.println("============MESSAGE -3- =========");
return warningMessage;
} else {
return null;
}
}
}
WarningMessageSendUtil.java
package com.hsiehchou.spark.warn.service;
import com.hsiehchou.common.regex.Validation;
import com.hsiehchou.spark.warn.domain.WarningMessage;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class WarningMessageSendUtil {
private static final Logger LOG = LoggerFactory.getLogger(WarningMessageSendUtil.class);
public static void messageWarn(WarningMessage warningMessage) {
String[] mobiles = warningMessage.getSendMobile().split(",");
for(String phone:mobiles){
if(Validation.isMobile(phone)){
System.out.println("开始向手机号为" + phone + "发送告警消息====" + warningMessage);
StringBuffer sb= new StringBuffer();
String content=warningMessage.getSenfInfo().toString();
//TODO 调用短信接口发送消息
//TODO 怎么通过短信发送 这个是需要公司开通接口
//TODO DINGDING
// 专门的接口
/* sb.append(ClusterProperties.https_url + "username=" + ClusterProperties.https_username +
"&password=" + ClusterProperties.https_password + "&mobile=" + phone +
"&apikey=" + ClusterProperties.https_apikey+
"&content=" + URLEncoder.encode(content));*/
// sendMessage(sb.toString());
}
}
}
}
6、创建redis子项目
操作redis 使用
新建xz_bigdata_redis子模块
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>xz_bigdata2</artifactId>
<groupId>com.hsiehchou</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>xz_bigdata_redis</artifactId>
<name>xz_bigdata_redis</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<jedis.version>2.7.0</jedis.version>
</properties>
<dependencies>
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_resources</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_common</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>redis.clients</groupId>
<artifactId>jedis</artifactId>
<version>${jedis.version}</version>
</dependency>
</dependencies>
</project>
新建com.hsiehchou.redis.client包
创建redis连接类—JedisSingle
JedisSingle.java
package com.hsiehchou.redis.client;
import com.hsiehchou.common.config.ConfigUtil;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import redis.clients.jedis.Jedis;
import redis.clients.jedis.exceptions.JedisConnectionException;
import java.net.SocketTimeoutException;
import java.util.Map;
import java.util.Properties;
public class JedisSingle {
private static final Logger LOG = LoggerFactory.getLogger(JedisSingle.class);
private static Properties redisConf;
/**
* 读取redis配置文件
* redis.hostname = 192.168.247.103
* redis.port = 6379
*/
static {
redisConf = ConfigUtil.getInstance().getProperties("redis/redis.properties");
System.out.println(redisConf);
}
public static Jedis getJedis(int db){
Jedis jedis = JedisSingle.getJedis();
if(jedis!=null){
jedis.select(db);
}
return jedis;
}
public static void main(String[] args) {
Jedis jedis = JedisSingle.getJedis(15);
Map<String, String> Map = jedis.hgetAll("phone:18609765435");
System.out.println(Map.toString());
}
public static Jedis getJedis(){
int timeoutCount = 0;
while (true) {// 如果是网络超时则多试几次
try
{
Jedis jedis = new Jedis(redisConf.get("redis.hostname").toString(),
Integer.valueOf(redisConf.get("redis.port").toString()));
return jedis;
} catch (Exception e)
{
if (e instanceof JedisConnectionException || e instanceof SocketTimeoutException)
{
timeoutCount++;
LOG.warn("获取jedis连接超时次数:" +timeoutCount);
if (timeoutCount > 4)
{
LOG.error("获取jedis连接超时次数a:" +timeoutCount);
LOG.error(null,e);
break;
}
}else
{
LOG.error("getJedis error", e);
break;
}
}
}
return null;
}
public static void close(Jedis jedis){
if(jedis!=null){
jedis.close();
}
}
}
7、创建定时任务,将规则同步到redis
新建 com.hsiehchou.spark.warn.timer 包
新建 SyncRule2Redis,WarnHelper
SyncRule2Redis.java
package com.hsiehchou.spark.warn.timer;
import java.util.TimerTask;
public class SyncRule2Redis extends TimerTask {
@Override
public void run() {
//这里定义同步方法
//就是读取mysql的数据 然后写入到redis中
System.out.println("========开始同步MYSQL规则到redis=======");
WarnHelper.syncRuleFromMysql2Redis();
System.out.println("============开始同步规则成功===========");
}
}
WarnHelper.java
package com.hsiehchou.spark.warn.timer;
import com.hsiehchou.redis.client.JedisSingle;
import com.hsiehchou.spark.warn.dao.XZ_RuleDao;
import com.hsiehchou.spark.warn.domain.XZ_RuleDomain;
import org.apache.commons.lang3.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import redis.clients.jedis.Jedis;
import java.util.List;
public class WarnHelper {
private static final Logger LOG = LoggerFactory.getLogger(WarnHelper.class);
/**
* 同步mysql规则数据到redis
*/
public static void syncRuleFromMysql2Redis(){
//获取所有的规则
List<XZ_RuleDomain> ruleList = XZ_RuleDao.getRuleList();
Jedis jedis = null;
try {
//获取redis 客户端
jedis = JedisSingle.getJedis(15);
for (int i = 0; i < ruleList.size(); i++) {
XZ_RuleDomain rule = ruleList.get(i);
String id = rule.getId()+"";
String publisher = rule.getPublisher();
String warn_fieldname = rule.getWarn_fieldname();
String warn_fieldvalue = rule.getWarn_fieldvalue();
String send_mobile = rule.getSend_mobile();
String send_type = rule.getSend_type();
//拼接redis key值
String redisKey = warn_fieldname +":" + warn_fieldvalue;
//通过redis hash结构 hashMap
jedis.hset(redisKey,"id",StringUtils.isNoneBlank(id) ? id : "");
jedis.hset(redisKey,"publisher",StringUtils.isNoneBlank(publisher) ? publisher : "");
jedis.hset(redisKey,"warn_fieldname",StringUtils.isNoneBlank(warn_fieldname) ? warn_fieldname : "");
jedis.hset(redisKey,"warn_fieldvalue",StringUtils.isNoneBlank(warn_fieldvalue) ? warn_fieldvalue : "");
jedis.hset(redisKey,"send_mobile",StringUtils.isNoneBlank(send_mobile) ? send_mobile : "");
jedis.hset(redisKey,"send_type",StringUtils.isNoneBlank(send_type) ? send_type : "");
}
} catch (Exception e) {
LOG.error("同步规则到es失败",e);
} finally {
JedisSingle.close(jedis);
}
}
public static void main(String[] args)
{
WarnHelper.syncRuleFromMysql2Redis();
}
}
8、创建streaming流任务
scala/com/hsiehchou/spark/streaming/kafka/warn
WarningStreamingTask.scala
package com.hsiehchou.spark.streaming.kafka.warn
import java.util.Timer
import com.hsiehchou.redis.client.JedisSingle
import com.hsiehchou.spark.common.SparkContextFactory
import com.hsiehchou.spark.streaming.kafka.Spark_Kafka_ConfigUtil
import com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming.kafkaConfig
import com.hsiehchou.spark.warn.service.BlackRuleWarning
import com.hsiehchou.spark.warn.timer.SyncRule2Redis
import org.apache.spark.Logging
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.kafka.KafkaManager
import redis.clients.jedis.Jedis
object WarningStreamingTask extends Serializable with Logging{
def main(args: Array[String]): Unit = {
//定义一个定时器去定时同步 MYSQL到REDIS
val timer : Timer = new Timer
//SyncRule2Redis 任务类
//0 第一次开始执行
//1*60*1000 隔多少时间执行一次
timer.schedule(new SyncRule2Redis,0,1*60*1000)
//从kafka中获取数据流
//val topics = "chl_test7".split(",")
//kafka topic
val topics = "chl_test7".split(",")
//val ssc = SparkContextFactory.newSparkLocalStreamingContext("WarningStreamingTask1", java.lang.Long.valueOf(10),1)
val ssc:StreamingContext = SparkContextFactory.newSparkStreamingContext("Kafka2esStreaming", java.lang.Long.valueOf(10))
//构建kafkaManager
val kafkaManager = new KafkaManager(
Spark_Kafka_ConfigUtil.getKafkaParam(kafkaConfig.getProperty("metadata.broker.list"), "WarningStreamingTask111")
)
//使用kafkaManager创建DStreaming流
val kafkaDS = kafkaManager.createJsonToJMapStringDirectStreamWithOffset(ssc, topics.toSet)
//添加一个日期分组字段
//如果数据其他的转换,可以先在这里进行统一转换
.persist(StorageLevel.MEMORY_AND_DISK)
kafkaDS.foreachRDD(rdd=>{
//流量预警
//if(!rdd.isEmpty()){
/* val count_flow = rdd.map(x=>{
val flow = java.lang.Long.valueOf(x.get("collect_time"))
flow
}).reduce(_+_)
if(count_flow > 1719179595L){
println("流量预警: 阈值[1719179595L] 实际值:"+ count_flow)
}*/
//}
//客户端连接之类的 最好不要放在RDD外面,因为在处理partion时,数据需要分发到各个节点上去
//数据分发必须需要序列化才可以,如果不能序列化,分发会报错
//如果这个数据 包括他里面的内容 都可以序列化,那么可以直接放在RDD外面
var jedis:Jedis = null
try {
//jedis = JedisSingle.getJedis(15)
rdd.foreachPartition(partion => {
jedis = JedisSingle.getJedis(15)
while (partion.hasNext) {
val map = partion.next()
val table = map.get("table")
val mapObject = map.asInstanceOf[java.util.Map[String,Object]]
println(table)
//开始比对
BlackRuleWarning.blackWarning(mapObject,jedis)
}
})
} catch {
case e => e.printStackTrace()
} finally {
JedisSingle.close(jedis)
}
/* rdd.foreachPartition(partion => {
var jedis: Jedis = null
try {
jedis = JedisSingle.getJedis(15)
while (partion.hasNext) {
val map = partion.next()
val mapObject = map.asInstanceOf[java.util.Map[String, Object]]
//开始比对
BlackRuleWarning.blackWarning(mapObject, jedis)
}
} catch {
case e => logError(null,e)
}finally {
JedisSingle.close(jedis)
}
})*/
})
ssc.start()
ssc.awaitTermination()
}
}
9、执行
spark-submit --
master local[1] --
num-executors 1 --
driver-memory 300m --
executor-memory 500m --
executor-cores 1 --
jars $(echo /usr/chl/spark7/jars/*.jar | tr ‘ ‘ ‘,’) --
class com.hsiehchou.spark.streaming.kafka.warn.WarningStreamingTask /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar
10、截图
11、redis安装
解压:tar -zxvf redis-3.0.5.tar.gz
cd redis-3.0.5/
make
make PREFIX=/opt/software/redis install
redis-benchmark : Redis提供的压力测试工具。模拟产生客户端的压力
redis-check-aof : 检查aof日志文件
redis-check-dump : 检查rdb文件
redis-cli : Redis客户端脚本
redis-sentinel : 哨兵
redis-server : Redis服务器脚本
核心配置文件:redis.conf
[root@hsiehchou202 redis-3.0.5]# cp redis.conf /opt/software/redis
[root@hsiehchou202 redis]# mkdir conf
[root@hsiehchou202 redis]# mv redis.conf conf/
[root@hsiehchou202 conf]# vi redis.conf
42行 daemonize yes //后台方式运行
50行 port 6379
启动redis ./bin/redis-server conf/redis.conf
检测是否启动好
[root@hsiehchou202 redis]# bin/redis-server conf/redis.conf
十、Spark—kafka2hive
1、CDH启用Hive on spark
设置 hive on spark 参数
原来的HIVE执行引擎使用的hadoop的mapreduce,Hive on Spark 就是讲执行引擎换为spark 引擎
2、hive配置文件
scala/com/hsiehchou/spark/streaming/kafka/kafka2hdfs/
HiveConfig.scala
package com.hsiehchou.spark.streaming.kafka.kafka2hdfs
import java.util
import org.apache.commons.configuration.{CompositeConfiguration, ConfigurationException, PropertiesConfiguration}
import org.apache.spark.Logging
import org.apache.spark.sql.types.{StringType, StructField, StructType}
import scala.collection.mutable.ArrayBuffer
import scala.collection.JavaConversions._
object HiveConfig extends Serializable with Logging {
//HIVE 文件根目录
var hive_root_path = "/apps/hive/warehouse/external/"
var hiveFieldPath = "es/mapping/fieldmapping.properties"
var config: CompositeConfiguration = null
//所有的表
var tables: util.List[_] = null
//表对应所有的字段映射,可以通过table名获取 这个table的所有字段
var tableFieldsMap: util.Map[String, util.HashMap[String, String]] = null
//StructType
var mapSchema: util.Map[String, StructType] = null
//建表语句
var hiveTableSQL: util.Map[String, String] = null
/**
* 主要就是创建mapSchema 和 hiveTableSQL
*/
initParams()
def main(args: Array[String]): Unit = {
}
/**
* 初始化HIVE参数
*/
def initParams(): Unit = {
//加载es/mapping/fieldmapping.properties 配置文件
config = HiveConfig.readCompositeConfiguration(hiveFieldPath)
println("==========================config====================================")
config.getKeys.foreach(key => {
println(key + ":" + config.getProperty(key.toString))
})
println("==========================tables====================================")
//wechat,mail,qq
tables = config.getList("tables")
tables.foreach(table => {
println(table)
})
var tables1 = config.getProperty("tables")
println("======================tableFieldsMap================================")
//(qq,{qq.imsi=string, qq.id=string, qq.send_message=string, qq.filename=string})
tableFieldsMap = HiveConfig.getKeysByType()
tableFieldsMap.foreach(x => {
println(x)
})
println("=========================mapSchema===================================")
mapSchema = HiveConfig.createSchema()
mapSchema.foreach(x => {
// val structType = x._2
// println("-----------")
// println(structType)
//
//
// val names = structType.fieldNames
// names.foreach(field => {
// println(field)
// })
println(x)
})
println("=========================hiveTableSQL===================================")
hiveTableSQL = HiveConfig.getHiveTables()
hiveTableSQL.foreach(x => {
println(x)
})
}
/**
* 读取hive 字段配置文件
* @param path
* @return
*/
def readCompositeConfiguration(path: String): CompositeConfiguration = {
logInfo("加载配置文件 " + path)
//多配置工具
val compositeConfiguration = new CompositeConfiguration
try {
val configuration = new PropertiesConfiguration(path)
compositeConfiguration.addConfiguration(configuration)
} catch {
case e: ConfigurationException => {
logError("加载配置文件 " + path + "失败", e)
}
}
logInfo("加载配置文件" + path + "成功。 ")
compositeConfiguration
}
/**
* 获取table-字段 对应关系
* 使用 util.Map[String,util.HashMap[String, String结构保存
* @return
*/
def getKeysByType(): util.Map[String, util.HashMap[String, String]] = {
val map = new util.HashMap[String, util.HashMap[String, String]]()
println("__________________tables_____________________"+tables)
//wechat, mail, qq
val iteratorTable = tables.iterator()
//对每个表进行遍历
while (iteratorTable.hasNext) {
//使用一个MAP保存一种对应关系
val fieldMap = new util.HashMap[String, String]()
//获取一个表
val table: String = iteratorTable.next().toString
//获取这个表的所有字段
val fields = config.getKeys(table)
//获取通用字段 这里暂时没有
val commonKeys: util.Iterator[String] = config.getKeys("common").asInstanceOf[util.Iterator[String]]
//将通用字段放到map结构中去
while (commonKeys.hasNext) {
val key = commonKeys.next()
fieldMap.put(key.replace("common", table), config.getString(key))
}
//将每种表的私有字段放到map中去
while (fields.hasNext) {
val field = fields.next().toString
fieldMap.put(field, config.getString(field))
println("__________________field_____________________"+"\n"+field)
}
map.put(table, fieldMap)
}
map
}
/**
* 构建建表语句
* 例如CREATE external TABLE IF NOT EXISTS qq (imei string,imsi string,longitude string,latitude string,phone_mac string,device_mac string,device_number string,collect_time string,username string,phone string,object_username string,send_message string,accept_message string,message_time string,id string,table string,filename string,absolute_filename string)
* @return
*/
def getHiveTables(): util.Map[String, String] = {
val hiveTableSqlMap: util.Map[String, String] = new util.HashMap[String, String]()
//获取没中数据的建表语句
tables.foreach(table => {
var sql: String = s"CREATE external TABLE IF NOT EXISTS ${table} ("
val tableFields = config.getKeys(table.toString)
tableFields.foreach(tableField => {
//qq.imsi=string, qq.id=string, qq.send_message=string
val fieldType = config.getProperty(tableField.toString)
val field = tableField.toString.split("\\.")(1)
sql = sql + field
fieldType match {
//就是将配置中的类型映射为HIVE 建表语句中的类型
case "string" => sql = sql + " string,"
case "long" => sql = sql + " string,"
case "double" => sql = sql + " string,"
case _ => println("Nothing Matched!!" + fieldType)
}
})
sql = sql.substring(0, sql.length - 1)
//sql = sql + s")STORED AS PARQUET location '${hive_root_path}${table}'"
sql = sql + s") partitioned by(year string,month string,day string) STORED AS PARQUET " + s"location '${hive_root_path}${table}'"
hiveTableSqlMap.put(table.toString, sql)
})
hiveTableSqlMap
}
/**
* 使用tableFieldsMap
* 对每种类型数据创建对应的Schema
* @return
*/
def createSchema(): util.Map[String, StructType] = {
// schema 表结构
/* CREATE TABLE `warn_message` (
//arrayStructType
`id` int(11) NOT NULL AUTO_INCREMENT,
`alarmRuleid` varchar(255) DEFAULT NULL,
`alarmType` varchar(255) DEFAULT NULL,
`sendType` varchar(255) DEFAULT NULL,
`sendMobile` varchar(255) DEFAULT NULL,
`sendEmail` varchar(255) DEFAULT NULL,
`sendStatus` varchar(255) DEFAULT NULL,
`senfInfo` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`hitTime` datetime DEFAULT NULL,
`checkinTime` datetime DEFAULT NULL,
`isRead` varchar(255) DEFAULT NULL,
`readAccounts` varchar(255) DEFAULT NULL,
`alarmaccounts` varchar(255) DEFAULT NULL,
`accountid` varchar(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=528 DEFAULT CHARSET=latin1;*/
val mapStructType: util.Map[String, StructType] = new util.HashMap[String, StructType]()
for (table <- tables) {
//通过tableFieldsMap 拿到这个表的所有字段
val tableFields = tableFieldsMap.get(table)
//对这个字段进行遍历
val keyIterator = tableFields.keySet().iterator()
//创建ArrayBuffer
var arrayStructType = ArrayBuffer[StructField]()
while (keyIterator.hasNext) {
val key = keyIterator.next()
val value = tableFields.get(key)
//将key拆分 获取 "."后面的部分作为数据字段
val field = key.split("\\.")(1)
value match {
/* case "string" => arrayStructType += StructField(field, StringType, true)
case "long" => arrayStructType += StructField(field, LongType, true)
case "double" => arrayStructType += StructField(field, DoubleType, true)*/
case "string" => arrayStructType += StructField(field, StringType, true)
case "long" => arrayStructType += StructField(field, StringType, true)
case "double" => arrayStructType += StructField(field, StringType, true)
case _ => println("Nothing Matched!!" + value)
}
}
val schema = StructType(arrayStructType)
mapStructType.put(table.toString, schema)
}
mapStructType
}
}
3、kafka写hdfs和创建hive表
Kafka2HiveTest.scala
package com.hsiehchou.spark.streaming.kafka.kafka2hdfs
import java.util
import com.hsiehchou.hdfs.HdfsAdmin
import com.hsiehchou.hive.HiveConf
import com.hsiehchou.spark.common.{SparkContextFactory}
import com.hsiehchou.spark.streaming.kafka.Spark_Kafka_ConfigUtil
import com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming.kafkaConfig
import org.apache.hadoop.fs.Path
import org.apache.spark.{Logging}
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.{DataFrame, Row, SaveMode}
import org.apache.spark.sql.types.StructType
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.kafka.KafkaManager
import scala.collection.JavaConversions._
object Kafka2HiveTest extends Serializable with Logging{
val topics = "chl_test7".split(",")
//获取所有数据类型
//获取所有数据的Schema
def main(args: Array[String]): Unit = {
//val ssc = SparkContextFactory.newSparkLocalStreamingContext("XZ_kafka2es", java.lang.Long.valueOf(10),1)
val ssc = SparkContextFactory.newSparkStreamingContext("Kafka2HiveTest", java.lang.Long.valueOf(10))
//1.创建HIVE表 hiveSQL已經創建好了
val sc = ssc.sparkContext
val hiveContext: HiveContext = HiveConf.getHiveContext(sc)
hiveContext.setConf("spark.sql.parquet.mergeSchema", "true")
createHiveTable(hiveContext)
//kafka拿到流数据
val kafkaDS = new KafkaManager(Spark_Kafka_ConfigUtil
.getKafkaParam(kafkaConfig.getProperty("metadata.broker.list"),
"Kafka2HiveTest"))
.createJsonToJMapStringDirectStreamWithOffset(ssc, topics.toSet)
.persist(StorageLevel.MEMORY_AND_DISK)
HiveConfig.tables.foreach(table=>{
//过滤出单一数据类型(获取和table相同类型的所有数据)
val tableDS = kafkaDS.filter(x => {table.equals(x.get("table"))})
//获取数据类型的schema 表结构
val schema = HiveConfig.mapSchema.get(table)
//获取这个表的所有字段
val schemaFields: Array[String] = schema.fieldNames
tableDS.foreachRDD(rdd=>{
//TODO 数据写入HDFS
/* val sc = rdd.sparkContext
val hiveContext = HiveConf.getHiveContext(sc)
hiveContext.sql(s"USE DEFAULT")*/
//将RDD转为DF 原因:要加字段描述,写比较方便
val tableDF = rdd2DF(rdd,schemaFields,hiveContext,schema)
//多种数据一起处理
val path_all = s"hdfs://hadoop1:8020${HiveConfig.hive_root_path}${table}"
val exists = HdfsAdmin.get().getFs.exists(new Path(path_all))
//2.写到HDFS 不管存不存在我们都要把数据写入进去 通过追加的方式
//每10秒写一次,写一次会生成一个文件
tableDF.write.mode(SaveMode.Append).parquet(path_all)
//3.加载数据到HIVE
if (!exists) {
//如果不存在 进行首次加载
System.out.println("===================开始加载数据到分区=============")
hiveContext.sql(s"ALTER TABLE ${table} LOCATION '${path_all}'")
}
})
})
ssc.start()
ssc.awaitTermination()
}
/**
* 创建HIVE表
* @param hiveContext
*/
def createHiveTable(hiveContext: HiveContext): Unit ={
val keys = HiveConfig.hiveTableSQL.keySet()
keys.foreach(key=>{
val sql = HiveConfig.hiveTableSQL.get(key)
//通过hiveContext 和已经创建好的SQL语句去创建HIVE表
hiveContext.sql(sql)
println(s"创建表${key}成功")
})
}
/**
* 将RDD转为DF
* @param rdd
* @param schemaFields
* @param hiveContext
* @param schema
* @return
*/
def rdd2DF(rdd:RDD[util.Map[String,String]],
schemaFields: Array[String],
hiveContext:HiveContext,
schema:StructType): DataFrame ={
//将RDD[Map[String,String]]转为RDD[ROW]
val rddRow = rdd.map(recourd => {
val listRow: util.ArrayList[Object] = new util.ArrayList[Object]()
for (schemaField <- schemaFields) {
listRow.add(recourd.get(schemaField))
}
Row.fromSeq(listRow)
//所有分区合并成一个
}).repartition(1)
//构建DF
//def createDataFrame(rowRDD: RDD[Row], schema: StructType)
val typeDF = hiveContext.createDataFrame(rddRow, schema)
typeDF
}
}
4、Kafka2HiveTest 执行
spark-submit --
master local[1] --
num-executors 1 --
driver-memory 300m --
executor-memory 500m --
executor-cores 1 --
jars $(echo /usr/chl/spark7/jars/*.jar | tr ‘ ‘ ‘,’) --
class com.hsiehchou.spark.streaming.kafka.kafka2hdfs.Kafka2HiveTest /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar
5、xz_bigdata_spark/src/java/
com/hsiehchou/hdfs
HdfsAdmin.java—HDFS 文件操作类
package com.hsiehchou.hdfs;
import com.hsiehchou.common.adjuster.StringAdjuster;
import com.hsiehchou.common.file.FileCommon;
import com.google.common.base.Preconditions;
import com.google.common.collect.Lists;
import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.apache.log4j.Logger;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.lang.reflect.Array;
import java.util.Collection;
import java.util.List;
/**
* HDFS 文件操作类
*/
public class HdfsAdmin {
private static Logger LOG;
private static final String HDFS_SITE = "/hadoop/hdfs-site.xml";
private static final String CORE_SITE = "/hadoop/core-site.xml";
private volatile static HdfsAdmin hdfsAdmin;
private FileSystem fs;
private HdfsAdmin(Configuration conf, Logger logger){
try {
if(conf == null) conf = newConf();
conf.set("fs.defaultFS","hdfs://hadoop1:8020");
fs = FileSystem.get(conf);
} catch (IOException e) {
LOG.error("获取 hdfs的FileSystem出现异常。", e);
}
Preconditions.checkNotNull(fs, "没有获取到可用的Hdfs的FileSystem");
this.LOG = logger;
if(this.LOG == null)
this.LOG = Logger.getLogger(HdfsAdmin.class);
}
private Configuration newConf(){
Configuration conf = new Configuration();
if(FileCommon.exist(HDFS_SITE)) conf.addResource(HDFS_SITE);
if(FileCommon.exist(CORE_SITE)) conf.addResource(CORE_SITE);
return conf;
}
public static HdfsAdmin get(){
return get(null);
}
/**
* 获取hdfsAdmin
* @param logger
* @return
*/
public static HdfsAdmin get(Logger logger){
if(hdfsAdmin == null){
synchronized (HdfsAdmin.class){
if(hdfsAdmin == null) hdfsAdmin = new HdfsAdmin(null, logger);
}
}
return hdfsAdmin;
}
public static HdfsAdmin get(Configuration conf, Logger logger){
if(hdfsAdmin == null){
synchronized (HdfsAdmin.class){
if(hdfsAdmin == null) hdfsAdmin = new HdfsAdmin(conf, logger);
}
}
return hdfsAdmin;
}
public FileStatus getFileStatus(String dir) {
FileStatus fileStatus = null;
try {
fileStatus = fs.getFileStatus(new Path(dir));
} catch (IOException e) {
LOG.error(String.format("获取文件 %s信息失败。", dir), e);
}
return fileStatus;
}
public void createFile(String dst , byte[] contents){
//目标路径
Path dstPath = new Path(dst);
//打开一个输出流
FSDataOutputStream outputStream;
try {
outputStream = fs.create(dstPath);
outputStream.write(contents);
outputStream.flush();
outputStream.close();
} catch (IOException e) {
LOG.error(String.format("创建文件 %s 失败。", dst), e);
}
LOG.info(String.format("文件: %s 创建成功!", dst));
}
//上传本地文件
public void uploadFile(String src,String dst){
//原路径
Path srcPath = new Path(src);
//目标路径
Path dstPath = new Path(dst);
//调用文件系统的文件复制函数,前面参数是指是否删除原文件,true为删除,默认为false
try {
fs.copyFromLocalFile(false,srcPath, dstPath);
} catch (IOException e) {
LOG.error(String.format("上传文件 %s 到 %s 失败。", src, dst), e);
}
//打印文件路径
LOG.info(String.format("上传文件 %s 到 %s 完成。", src, dst));
}
public void downloadFile(String src , String dst){
Path dstPath = new Path(dst) ;
try {
fs.copyToLocalFile(false, new Path(src), dstPath);
} catch (IOException e) {
LOG.error(String.format("下载文件 %s 到 %s 失败。", src, dst), e);
}
LOG.info(String.format("下载文件 %s 到 %s 完成", src, dst));
}
//文件重命名
public void rename(String oldName,String newName){
Path oldPath = new Path(oldName);
Path newPath = new Path(newName);
boolean isok = false;
try {
isok = fs.rename(oldPath, newPath);
} catch (IOException e) {
LOG.error(String.format("重命名文件 %s 为 %s 失败。", oldName, newName), e);
}
if(isok){
LOG.info(String.format("重命名文件 %s 为 %s 完成。", oldName, newName));
}else{
LOG.error(String.format("重命名文件 %s 为 %s 失败。", oldName, newName));
}
}
public void delete(String path){
delete(path, true);
}
//删除文件
public void delete(String path, boolean recursive){
Path deletePath = new Path(path);
boolean isok = false;
try {
isok = fs.delete(deletePath, recursive);
} catch (IOException e) {
LOG.error(String.format("删除文件 %s 失败。", path), e);
}
if(isok){
LOG.info(String.format("删除文件 %s 完成。", path));
}else{
LOG.error(String.format("删除文件 %s 失败。", path));
}
}
//创建目录
public void mkdir(String path){
Path srcPath = new Path(path);
boolean isok = false;
try {
isok = fs.mkdirs(srcPath);
} catch (IOException e) {
LOG.error(String.format("创建目录 %s 失败。", path), e);
}
if(isok){
LOG.info(String.format("创建目录 %s 完成。", path));
}else{
LOG.error(String.format("创建目录 %s 失败。", path));
}
}
//读取文件的内容
public InputStream readFile(String filePath){
Path srcPath = new Path(filePath);
InputStream in = null;
try {
in = fs.open(srcPath);
} catch (IOException e) {
LOG.error(String.format("读取文件 %s 失败。", filePath), e);
}
return in;
}
public <T> void readFile(String filePath, StringAdjuster<T> adjuster, Collection<T> result){
InputStream inputStream = readFile(filePath);
if(inputStream != null){
InputStreamReader reader = new InputStreamReader(inputStream);
BufferedReader bufferedReader = new BufferedReader(reader);
String line;
try {
T t;
while((line = bufferedReader.readLine()) != null){
t = adjuster.doAdjust(line);
if(t != null)result.add(t);
}
} catch (IOException e) {
LOG.error(String.format("利用缓冲流读取文件 %s 失败。", filePath), e);
}finally {
IOUtils.closeQuietly(bufferedReader);
IOUtils.closeQuietly(reader);
IOUtils.closeQuietly(inputStream);
}
}
}
public List<String> readLines(String filePath){
return readLines(filePath, "UTF-8");
}
public List<String> readLines(String filePath, String encoding){
InputStream inputStream = readFile(filePath);
List<String> lines = null;
if(inputStream != null) {
try {
lines = IOUtils.readLines(inputStream, encoding);
} catch (IOException e) {
LOG.error(String.format("按行读取文件 %s 失败。", filePath), e);
}finally {
IOUtils.closeQuietly(inputStream);
}
}
return lines;
}
public List<FileStatus> findNewFileOrDirInDir(String dir, HdfsFileFilter filter,
final boolean onlyFile, final boolean onlyDir){
return findNewFileOrDirInDir(dir, filter, onlyFile, onlyDir, false);
}
public List<FileStatus> findNewFileOrDirInDir(String dir, HdfsFileFilter filter,
final boolean onlyFile, final boolean onlyDir, boolean recursive){
if(onlyFile && onlyDir){
FileStatus fileStatus = getFileStatus(dir);
if(fileStatus == null)return Lists.newArrayList();
if(isAccepted(fileStatus,filter)){
return Lists.newArrayList(fileStatus);
}
return Lists.newArrayList();
}
if(onlyFile){
return findNewFileInDir(dir, filter, recursive);
}
if(onlyDir){
return findNewDirInDir(dir, filter, recursive);
}
return Lists.newArrayList();
}
/**
* 查找一个文件夹中 新建的目录
* @param dir
* @param filter
* @return
*/
public List<FileStatus> findNewDirInDir(String dir, HdfsFileFilter filter){
return findNewDirInDir(new Path(dir), filter, false);
}
public List<FileStatus> findNewDirInDir(Path path, HdfsFileFilter filter){
return findNewDirInDir(path, filter, false);
}
public List<FileStatus> findNewDirInDir(String dir, HdfsFileFilter filter, boolean recursive){
return findNewDirInDir(new Path(dir), filter, recursive);
}
public List<FileStatus> findNewDirInDir(Path path, HdfsFileFilter filter, boolean recursive){
FileStatus[] files = null;
try {
files = fs.listStatus(path);
} catch (IOException e) {
LOG.error(String.format("获取目录 %s下的文件列表失败。", path), e);
}
if(files == null)return Lists.newArrayList();
List<FileStatus> paths = Lists.newArrayList();
List<String> res = Lists.newArrayList();
for(FileStatus fileStatus : files){
if (fileStatus.isDirectory()) {
if (isAccepted(fileStatus, filter)) {
paths.add(fileStatus);
res.add(fileStatus.getPath().toString());
}else if(recursive){
paths.addAll(findNewDirInDir(fileStatus.getPath(), filter, recursive));
}
}
}
LOG.info(String.format("从目录%s 找到满足条件%s 有如下 %s 个文件: %s",
path, filter,res.size(), res));
return paths;
}
/**
* 查找一个文件夹中 新建的文件
* @param dir
* @param filter
* @return
*/
public List<FileStatus> findNewFileInDir(String dir, HdfsFileFilter filter){
return findNewFileInDir(new Path(dir), filter, false);
}
public List<FileStatus> findNewFileInDir(String dir, HdfsFileFilter filter, boolean recursive){
return findNewFileInDir(new Path(dir), filter, recursive);
}
public List<FileStatus> findNewFileInDir(Path path, HdfsFileFilter filter){
return findNewFileInDir(path, filter, false);
}
public List<FileStatus> findNewFileInDir(Path path, HdfsFileFilter filter, boolean recursive){
FileStatus[] files = null;
try {
files = fs.listStatus(path);
} catch (IOException e) {
LOG.error(String.format("获取目录 %s下的文件列表失败。", path), e);
}
if(files == null)return Lists.newArrayList();
List<FileStatus> paths = Lists.newArrayList();
List<String> res = Lists.newArrayList();
for(FileStatus fileStatus : files){
if (fileStatus.isFile()) {
if (isAccepted(fileStatus, filter)) {
paths.add(fileStatus);
res.add(fileStatus.getPath().toString());
}
}else if(recursive){
paths.addAll(findNewFileInDir(fileStatus.getPath(), filter, recursive));
}
}
LOG.info(String.format("从目录%s 找到满足条件%s 有如下 %s 个文件: %s", path, filter,res.size(), res));
return paths;
}
private boolean isAccepted(String file, HdfsFileFilter filter) {
if(filter == null) return true;
FileStatus fileStatus = getFileStatus(file);
if(fileStatus == null)return false;
return isAccepted(fileStatus, filter);
}
private boolean isAccepted(FileStatus fileStatus, HdfsFileFilter filter) {
return filter == null ? true : filter.filter(fileStatus);
}
public long getModificationTime(Path path){
try {
FileStatus status = fs.getFileStatus(path);
return status.getModificationTime();
} catch (IOException e) {
LOG.error(String.format("获取路径 %s信息失败。", path), e);
}
return -1L;
}
public FileSystem getFs() {
return fs;
}
public static void main(String[] args) throws Exception {
// HdfsAdmin hdfsAdmin = HdfsAdmin.get();
// hdfsAdmin.mkdir("hdfs://hdp04.ultiwill.com:8020/test1111");
//System.out.println(hdfsAdmin.getFs().exists(new Path("hdfs://hdp04.ultiwill.com:8020/test")));
//hdfsAdmin.delete("hdfs://hdp04.ultiwill.com:8020/test1111");
//System.out.println("hdfsAdmin = " + );
// List<FileStatus> status = hdfsAdmin.findNewDirInDir("hdfs://hdp04.ultiwill.com:50070/hdp", null);
//System.out.println("status = " + status.size());
}
}
HdfsFileFilter.java
package com.hsiehchou.hdfs;
import com.hsiehchou.common.filter.Filter;
import org.apache.hadoop.fs.FileStatus;
public abstract class HdfsFileFilter implements Filter<FileStatus> {
}
com/hsiehchou/hive
HiveConf.java
package com.hsiehchou.hive;
import org.apache.hadoop.conf.Configuration;
import org.apache.spark.SparkContext;
import org.apache.spark.sql.hive.HiveContext;
import java.util.Iterator;
import java.util.Map;
public class HiveConf {
//private static String DEFUALT_CONFIG = "spark/hive/hive-server-config";
private static HiveConf hiveConf;
private static HiveContext hiveContext;
private HiveConf(){
}
public static HiveConf getHiveConf(){
if(hiveConf==null){
synchronized (HiveConf.class){
if(hiveConf==null){
hiveConf=new HiveConf();
}
}
}
return hiveConf;
}
public static HiveContext getHiveContext(SparkContext sparkContext){
if(hiveContext==null){
synchronized (HiveConf.class){
if(hiveContext==null){
hiveContext = new HiveContext(sparkContext);
Configuration conf = new Configuration();
conf.addResource("spark/hive/hive-site.xml");
Iterator<Map.Entry<String, String>> iterator = conf.iterator();
while (iterator.hasNext()) {
Map.Entry<String, String> next = iterator.next();
hiveContext.setConf(next.getKey(), next.getValue());
}
hiveContext.setConf("spark.sql.parquet.mergeSchema", "true");
}
}
}
return hiveContext;
}
}
6、小文件合并
scala/com/hsiehchou/spark/streaming/kafka/kafka2hdfs
CombineHdfs.scala—合并HDFS小文件任务
package com.hsiehchou.spark.streaming.kafka.kafka2hdfs
import com.hsiehchou.hdfs.HdfsAdmin
import com.hsiehchou.spark.common.SparkContextFactory
import org.apache.hadoop.fs.{FileSystem, FileUtil, Path}
import org.apache.spark.Logging
import org.apache.spark.sql.{SQLContext, SaveMode}
import scala.collection.JavaConversions._
/**
* 合并HDFS小文件任务
*/
object CombineHdfs extends Serializable with Logging{
def main(args: Array[String]): Unit = {
// val sparkContext = SparkContextFactory.newSparkBatchContext("CombineHdfs")
val sparkContext = SparkContextFactory.newSparkLocalBatchContext("CombineHdfs")
//创建一个 sparkSQL
val sqlContext: SQLContext = new SQLContext(sparkContext)
//遍历表 就是遍历HIVE表
HiveConfig.tables.foreach(table=>{
//获取HDFS文件目录
//apps/hive/warehouse/external/mail类似
//apps/hive/warehouse/external/mail
val table_path =s"${HiveConfig.hive_root_path}$table"
//通过sparkSQL 加载 这些目录的文件
val tableDF = sqlContext.read.load(table_path)
//先获取原来数据种的所有文件 HDFS文件 API
val fileSystem:FileSystem = HdfsAdmin.get().getFs
//通过globStatus 获取目录下的正则匹配文件
//fileSystem.listFiles()
val arrayFileStatus = fileSystem.globStatus(new Path(table_path+"/part*"))
//stat2Paths将文件状态转为文件路径 这个文件路径是用来删除的
val paths = FileUtil.stat2Paths(arrayFileStatus)
//写入合并文件 //repartition 需要根据生产中实际情况去定义
tableDF.repartition(1).write.mode(SaveMode.Append).parquet(table_path)
println("写入" + table_path +"成功")
//删除小文件
paths.foreach(path =>{
HdfsAdmin.get().getFs.delete(path)
println("删除文件" + path + "成功")
})
})
}
}
7、定时任务
命令行输入:crontab -e
内容:0 1 * * *
spark-submit --
master local[1] --
num-executors 1 --
driver-memory 300m --
executor-memory 500m --
executor-cores 1 --
jars $(echo /usr/chl/spark7/jars/*.jar | tr ‘ ‘ ‘,’) --
class com.hsiehchou.spark.streaming.kafka.kafka2hdfs.CombineHdfs /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar
说明:* * * * *
执行的任务
项目 | 含义 | 范围 |
---|---|---|
第一个“*” | 一小时当中的第几分钟(分) | 0-59 |
第二个“*” | 一天当中的第几小时(时) | 0-23 |
第三个“*” | 一个月当中的第几天(天) | 1-31 |
第四个“*” | 一年当中的第几月(月) | 1-12 |
第五个“*” | 一周当中的星期几(周) | 0-7(0和7都代表星期日) |
8、合并小文件截图
9、hive命令
show tales;
hdfs dfs -ls /apps/hive/warehouse/external
hdfs dfs -rm -r /apps/hive/warehouse/external/mail
drop table mail;
desc qq;
select * from qq limit 1;
select count(*) from qq;
/usr/bin下面的启动zookeeper客户端
zookeeper-client
删除zookeeper里面的消费者数据
rmr /consumers/WarningStreamingTask2/offsets
rmr /consumers/Kafka2HiveTest/offsets
rmr /consumers/DataRelationStreaming1/offsets
十一、Spark—Kafka2Hbase
1、数据关联
(1)为什么需要关联
问题:我们不能充分了解数据之间的关联关系。
公司中应用的非常多
离线关联,传通数据 mysql 通过关联字段去关联。
但是,如果数据量非常大,关联表非常多。处理不了。
数据零散,只能从单一维度去看数据,看的面比较窄。
如果需要从多个维度分析,关联成本比较大。
建立数据之间的关联关系,实现关联查询的毫秒级响应;
另一个方面,可以为数据挖掘,机器学习提供训练数据。
后面进行机器学习的时候,都需要从多维度对数据进行分析和建模。
(2)HBASE 只要rowkey一样,那么他们就是一条数据
QQ
aa-aa-aa-aa-aa-aa 666666
微信
aa-aa-aa-aa-aa-aa weixin
邮箱
aa-aa-aa-aa-aa-aa 666666@qq.com
(3)如何关联
一对一的情况 :
https://blog.csdn.net/shujuelin/article/details/83657485
使用HBASE写入特性
比如 MAC1 1789932321
MAC1 88888@qq.com
MAC1 88888
一对多的情况怎么处理
使用多版本
aa-aa-aa-aa-aa-aa 666666
aa-aa-aa-aa-aa-aa 777777
(4)一对多
使用多版本存一堆多的关系
多版本 插入了一个777777 一个版本
再插入一个777777 一个版本
所以需要自定义版本号 确定版本唯一
通过 “888888”.hashCode() & Integer.MAX_VALUE
(5)如果实现hbase多字段查询
往主关联表 test:relation 里面写入数据 rowkey=>aa-aa-aa-aa-aa-aa version=>1637094383 类型phone_mac value=>aa-aa-aa-aa-aa-aa
往二级索表 test:phone_mac里面写入数据 rowkey=>aa-aa-aa-aa-aa-aa version=>1736188717 value=>aa-aa-aa-aa-aa-aa
查询不直接查主关联表,因为查询字段不在主键里面,没办法查或者性能非常低下。
查询是分2步rowkey查询
第一步, 通过查询字段取对应的二级索引表里面去找主关联表的ROWKEY
第二步, 通过主关联表的ROWKEY 获取HBASE中的全量数据
WIFI 已经入库的情况下,手机号也必须已经入库了,才能找到
加入WIFI的手机号还没有入库
如果是基础数据先过来 没有mac 没有主键
Card | phone |
---|---|
400000000000000 | 18612345678 |
关联
Phone | value (识别这个字段是身份证才可以) |
---|---|
18612345678 | 400000000000000 |
1)因为检索的时候都是通过索引表直接找MAC,混入了身份证
2)要进行一个合并
(6)关联及二级索引示意
(7)如果使用ES建立二级索引
如果hbase 里面有100个字段,存放的是全量信息,但是只有20个字段参与查询、检索,那么我们可以把这个20个字段单独提出来存放到es中,因为ES是对对字段,多条件查询非常灵活。所以我们可以先在ES中对条件进行检索,根据检索的结果拿到hbaSe的rowkey,然后再通过rowkey到hbase里面获取全量信息。
(8)Hbase 预分区
主要是根据rowkey分布来进行预分区
分区主要是为了防止热点问题
relation表为例
这个表的rowkey 是不是就是 mac
phone_mac 都是以0-9 a-f开头的
device_mac 都是以0-9 a-z开头的
Hbase 是按字典序排序
(9)自定义版本号
通过这样的一个转换我们可以精确定位数据的多版本号,,然后可以根据版本号对数据进行多版本删除。
156511 aaaaaaaa
2、DataRelationStreaming—数据关联
DataRelationStreaming.scala
package com.hsiehchou.spark.streaming.kafka.kafka2hbase
import java.util.Properties
import com.hsiehchou.common.config.ConfigUtil
import com.hsiehchou.hbase.config.HBaseTableUtil
import com.hsiehchou.hbase.insert.HBaseInsertHelper
import com.hsiehchou.hbase.spilt.SpiltRegionUtil
import com.hsiehchou.spark.common.SparkContextFactory
import com.hsiehchou.spark.streaming.kafka.Spark_Kafka_ConfigUtil
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.hbase.util.Bytes
import org.apache.spark.Logging
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.kafka.KafkaManager
object DataRelationStreaming extends Serializable with Logging{
// 读取需要关联的配置文件字段
// phone_mac,phone,username,send_mail,imei,imsi
val relationFields = ConfigUtil.getInstance()
.getProperties("spark/relation.properties")
.get("relationfield")
.toString
.split(",")
def main(args: Array[String]): Unit = {
//初始化hbase表
//initRelationHbaseTable(relationFields)
val ssc = SparkContextFactory.newSparkLocalStreamingContext("DataRelationStreaming", java.lang.Long.valueOf(10),1)
// val ssc = SparkContextFactory.newSparkStreamingContext("DataRelationStreaming", java.lang.Long.valueOf(10))
val kafkaConfig: Properties = ConfigUtil.getInstance().getProperties("kafka/kafka-server-config.properties")
val topics = "chl_test7".split(",")
val kafkaDS = new KafkaManager(Spark_Kafka_ConfigUtil
.getKafkaParam(kafkaConfig.getProperty("metadata.broker.list"),
"DataRelationStreaming2"))
.createJsonToJMapStringDirectStreamWithOffset(ssc, topics.toSet)
.persist(StorageLevel.MEMORY_AND_DISK)
kafkaDS.foreachRDD(rdd=>{
rdd.foreachPartition(partion=>{
//对partion进行遍历
while (partion.hasNext){
//获取每一条流数据
val map = partion.next()
//获取mac 主键
var phone_mac:String = map.get("phone_mac")
//获取所有关联字段 //phone_mac,phone,username,send_mail,imei,imsi
relationFields.foreach(relationFeild =>{
//relationFields 是关联字段,需要进行关联处理的,所有判断
//map中是不是包含这个字段,如果包含的话,取出来进行处理
if(map.containsKey(relationFeild)){
//创建主关联,并遍历关联字段进行关联
val put = new Put(phone_mac.getBytes())
//取关联字段的值
//TODO 到这里 主关联表的 主键和值都有了 然后封装成PUT写入hbase主关联表就行了
val value = map.get(relationFeild)
//自定义版本号 通过 (表字段名 + 字段值 取hashCOde)
//因为值有可能是字符串,但是版本号必须是long类型,所以这里我们需要
//将字符串影射唯一数字,而且必须是正整数
val versionNum = (relationFeild+value).hashCode() & Integer.MAX_VALUE
put.addColumn("cf".getBytes(), Bytes.toBytes(relationFeild),versionNum ,Bytes.toBytes(value.toString))
HBaseInsertHelper.put("test:relation",put)
println(s"往主关联表 test:relation 里面写入数据 rowkey=>${phone_mac} version=>${versionNum} 类型${relationFeild} value=>${value}")
// 建立二级索引
// 使用关联字段的值最为二级索引的rowkey
// 二级索引就是把这个字段的值作为索引表rowkey
// 把这个字段的mac做为索引表的值
val put_2 = new Put(value.getBytes())//把这个字段的值作为索引表rowkey
val table_name = s"test:${relationFeild}"//往索引表里面取写
//使用主表的rowkey 就是 取hash作为二级索引的版本号
val versionNum_2 = phone_mac.hashCode() & Integer.MAX_VALUE
put_2.addColumn("cf".getBytes(), Bytes.toBytes("phone_mac"),versionNum_2 ,Bytes.toBytes(phone_mac.toString))
HBaseInsertHelper.put(table_name,put_2)
println(s"往二级索表 ${table_name}里面写入数据 rowkey=>${value} version=>${versionNum_2} value=>${phone_mac}")
}
})
}
})
})
ssc.start()
ssc.awaitTermination()
}
def initRelationHbaseTable(relationFields:Array[String]): Unit ={
//初始化总关联表
val relation_table = "test:relation"
HBaseTableUtil.createTable(relation_table,
"cf",
true,
-1,
100,
SpiltRegionUtil.getSplitKeysBydinct)
//HBaseTableUtil.deleteTable(relation_table)
//遍历所有关联字段,根据字段创建二级索引表
relationFields.foreach(field=>{
val hbase_table = s"test:${field}"
HBaseTableUtil.createTable(hbase_table, "cf", true, -1, 100, SpiltRegionUtil.getSplitKeysBydinct)
// HBaseTableUtil.deleteTable(hbase_table)
})
}
}
3、com.hsiehchou.spark.streaming
common/SparkContextFactory.scala
package com.hsiehchou.spark.common
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{Accumulator, SparkContext}
object SparkContextFactory {
def newSparkBatchContext(appName:String = "sparkBatch") : SparkContext = {
val sparkConf = SparkConfFactory.newSparkBatchConf(appName)
new SparkContext(sparkConf)
}
def newSparkLocalBatchContext(appName:String = "sparkLocalBatch" , threads : Int = 2) : SparkContext = {
val sparkConf = SparkConfFactory.newSparkLoalConf(appName, threads)
sparkConf.set("","")
new SparkContext(sparkConf)
}
def getAccumulator(appName:String = "sparkBatch") : Accumulator[Int] = {
val sparkConf = SparkConfFactory.newSparkBatchConf(appName)
val accumulator: Accumulator[Int] = new SparkContext(sparkConf).accumulator(0,"")
accumulator
}
/**
* 创建本地流streamingContext
* @param appName appName
* @param batchInterval 多少秒读取一次
* @param threads 开启多少个线程
* @return
*/
def newSparkLocalStreamingContext(appName:String = "sparkStreaming" ,
batchInterval:Long = 30L ,
threads : Int = 4) : StreamingContext = {
val sparkConf = SparkConfFactory.newSparkLocalConf(appName, threads)
// sparkConf.set("spark.streaming.receiver.maxRate","10000")
sparkConf.set("spark.streaming.kafka.maxRatePerPartition","1")
new StreamingContext(sparkConf, Seconds(batchInterval))
}
/**
* 创建集群模式streamingContext
* 这里不设置线程数,在submit中指定
* @param appName
* @param batchInterval
* @return
*/
def newSparkStreamingContext(appName:String = "sparkStreaming" , batchInterval:Long = 30L) : StreamingContext = {
val sparkConf = SparkConfFactory.newSparkStreamingConf(appName)
new StreamingContext(sparkConf, Seconds(batchInterval))
}
def startSparkStreaming(ssc:StreamingContext){
ssc.start()
ssc.awaitTermination()
ssc.stop()
}
}
streaming/kafka/Spark_Kafka_ConfigUtil.scala
package com.hsiehchou.spark.streaming.kafka
import org.apache.spark.Logging
object Spark_Kafka_ConfigUtil extends Serializable with Logging{
def getKafkaParam(brokerList:String,groupId : String): Map[String,String]={
val kafkaParam=Map[String,String](
"metadata.broker.list" -> brokerList,
"auto.offset.reset" -> "smallest",
"group.id" -> groupId,
"refresh.leader.backoff.ms" -> "1000",
"num.consumer.fetchers" -> "8")
kafkaParam
}
}
4、com/hsiehchou/common/config/ConfigUtil
ConfigUtil.java
package com.hsiehchou.common.config;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
public class ConfigUtil {
private static Logger LOG = LoggerFactory.getLogger(ConfigUtil.class);
private static ConfigUtil configUtil;
public static ConfigUtil getInstance(){
if(configUtil == null){
configUtil = new ConfigUtil();
}
return configUtil;
}
public Properties getProperties(String path){
Properties properties = new Properties();
try {
LOG.info("开始加载配置文件" + path);
InputStream insss = this.getClass().getClassLoader().getResourceAsStream(path);
properties = new Properties();
properties.load(insss);
} catch (IOException e) {
LOG.info("加载配置文件" + path + "失败");
LOG.error(null,e);
}
LOG.info("加载配置文件" + path + "成功");
System.out.println("文件内容:"+properties);
return properties;
}
public static void main(String[] args) {
ConfigUtil instance = ConfigUtil.getInstance();
Properties properties = instance.getProperties("common/datatype.properties");
//Properties properties = instance.getProperties("spark/relation.properties");
// properties.get("relationfield");
System.out.println(properties);
}
}
5、构建模块—xz_bigdata_hbase
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>xz_bigdata2</artifactId>
<groupId>com.hsiehchou</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>xz_bigdata_hbase</artifactId>
<name>xz_bigdata_hbase</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<hbase.version>1.2.0</hbase.version>
</properties>
<dependencies>
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_resources</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_common</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}-${cdh.version}</version>
<exclusions>
<exclusion>
<artifactId>guava</artifactId>
<groupId>com.google.guava</groupId>
</exclusion>
<exclusion>
<artifactId>zookeeper</artifactId>
<groupId>org.apache.zookeeper</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>${hbase.version}-${cdh.version}</version>
<exclusions>
<exclusion>
<artifactId>servlet-api-2.5</artifactId>
<groupId>org.mortbay.jetty</groupId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</project>
com/hsiehchou/hbase/config/HBaseConf.java
package com.hsiehchou.hbase.config;
import com.hsiehchou.hbase.spilt.SpiltRegionUtil;
import org.apache.commons.configuration.CompositeConfiguration;
import org.apache.commons.configuration.ConfigurationException;
import org.apache.commons.configuration.PropertiesConfiguration;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.BufferedMutator;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.log4j.Logger;
import java.io.IOException;
import java.io.Serializable;
public class HBaseConf implements Serializable {
private static final long serialVersionUID = 1L;
private static final Logger LOG = Logger.getLogger(HBaseConf.class);
private static final String HBASE_SERVER_CONFIG = "hbase/hbase-server-config.properties";
private static final String HBASE_SITE = "hbase/hbase-site.xml";
private volatile static HBaseConf hbaseConf;
private CompositeConfiguration hbase_server_config;
public CompositeConfiguration getHbase_server_config() {
return hbase_server_config;
}
public void setHbase_server_config(CompositeConfiguration hbase_server_config) {
this.hbase_server_config = hbase_server_config;
}
//hbase 配置文件
private Configuration configuration;
//hbase 连接
private volatile transient Connection conn;
/**
* 初始化HBaseConf的时候加载配置文件
*/
private HBaseConf() {
hbase_server_config = new CompositeConfiguration();
//加载配置文件
loadConfig(HBASE_SERVER_CONFIG,hbase_server_config);
//初始化连接
getHconnection();
}
//获取连接
public Configuration getConfiguration(){
if(configuration==null){
configuration = HBaseConfiguration.create();
configuration.addResource(HBASE_SITE);
LOG.info("加载配置文件" + HBASE_SITE + "成功");
}
return configuration;
}
public BufferedMutator getBufferedMutator(String tableName) throws IOException {
return getHconnection().getBufferedMutator(TableName.valueOf(tableName));
}
public Connection getHconnection(){
if(conn==null){
//获取配置文件
getConfiguration();
synchronized (HBaseConf.class) {
if (conn == null) {
try {
conn = ConnectionFactory.createConnection(configuration);
} catch (IOException e) {
LOG.error(String.format("获取hbase的连接失败 参数为: %s", toString()), e);
}
}
}
}
return conn;
}
/**
* 加载配置文件
* @param path
* @param configuration
*/
private void loadConfig(String path,CompositeConfiguration configuration) {
try {
LOG.info("加载配置文件 " + path);
configuration.addConfiguration(new PropertiesConfiguration(path));
LOG.info("加载配置文件" + path +"成功。 ");
} catch (ConfigurationException e) {
LOG.error("加载配置文件 " + path + "失败", e);
}
}
/**
* 单例 初始化HBaseConf
* @return
*/
public static HBaseConf getInstance() {
if (hbaseConf == null) {
synchronized (HBaseConf.class) {
if (hbaseConf == null) {
hbaseConf = new HBaseConf();
}
}
}
return hbaseConf;
}
public static void main(String[] args) {
String hbase_table = "test:chl_test2";
HBaseTableUtil.createTable(hbase_table, "cf", true, -1, 1, SpiltRegionUtil.getSplitKeysBydinct());
/* Connection hconnection = HBaseConf.getInstance().getHconnection();
Connection hconnection1 = HBaseConf.getInstance().getHconnection();
System.out.println(hconnection);
System.out.println(hconnection1);*/
}
}
com/hsiehchou/hbase/config/HBaseTableFactory.java
package com.hsiehchou.hbase.config;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.BufferedMutator;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Table;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.io.Serializable;
public class HBaseTableFactory implements Serializable {
private static final long serialVersionUID = -1071596337076137201L;
private static final Logger LOG = LoggerFactory.getLogger(HBaseTableFactory.class);
private HBaseConf conf;
private transient Connection conn ;
private boolean isReady = true;
public HBaseTableFactory(){
conf = HBaseConf.getInstance();
if(true){
conn = conf.getHconnection();
}else{
isReady = false;
LOG.warn("HBase 连接没有启动。");
}
}
public HBaseTableFactory(Connection conn){
this.conn = conn;
}
/**
* 根据表名创建 表的实例
* @param tableName
* @return
* @throws IOException
* HTableInterface
*/
public Table getHBaseTableInstance(String tableName) throws IOException{
if(conn == null){
if(conf == null){
conf = HBaseConf.getInstance();
isReady = true;
LOG.warn("HBaseConf为空,重新初始化。");
}
synchronized (HBaseTableFactory.class) {
if(conn == null) {
conn = conf.getHconnection();
LOG.warn("初始 hbase Connection 为空 , 获取 Connection成功。");
}
}
}
return isReady ? conn.getTable(TableName.valueOf(tableName)) : null;
}
public HTable getHTable(String tableName) throws IOException{
return (HTable) getHBaseTableInstance(tableName);
}
public BufferedMutator getBufferedMutator(String tableName) throws IOException {
return getConf().getBufferedMutator(tableName);
}
public boolean isReady() {
return isReady;
}
private HBaseConf getConf(){
if(conf == null){
conf = HBaseConf.getInstance();
}
return conf;
}
public void close() throws IOException{
conn.close();
conn = null;
}
}
com/hsiehchou/hbase/config/HBaseTableUtil
package com.hsiehchou.hbase.config;
import com.google.common.collect.Sets;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.io.compress.Compression;
import org.apache.hadoop.hbase.io.encoding.DataBlockEncoding;
import org.apache.hadoop.hbase.regionserver.BloomType;
import org.apache.hadoop.hbase.util.Bytes;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.util.*;
import static com.google.common.base.Preconditions.checkArgument;
public class HBaseTableUtil {
private static final Logger LOG = LoggerFactory.getLogger(HBaseTableUtil.class);
private static final String COPROCESSORCLASSNAME = "org.apache.hadoop.hbase.coprocessor.AggregateImplementation";
private static HBaseConf conf = HBaseConf.getInstance() ;
private HBaseTableUtil(){}
/**
* 获取hbase 表连接
* @param tableName
* @return
*/
public static Table getTable(String tableName){
Table table =null;
if(tableExists(tableName)){
try {
table = conf.getHconnection().getTable(TableName.valueOf(tableName));
} catch (IOException e) {
LOG.error(null,e);
}
}
return table;
}
public static void close(Table table){
if(table != null) {
try {
table.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
/**
* 判断 HBase中是否存在 名为 tableName 的表
* @param tableName
* @return boolean
*/
public static boolean tableExists(String tableName){
boolean isExists = false;
try {
isExists = conf.getHconnection().getAdmin().tableExists(TableName.valueOf(tableName));
} catch (MasterNotRunningException e) {
LOG.error("HBase master 未运行 。 ", e);
} catch (ZooKeeperConnectionException e) {
LOG.error("zooKeeper 连接异常。 ", e);
} catch (IOException e) {
LOG.error("", e);
}
return isExists;
}
/**
* 删除表
* @param tableName
* @return
*/
public static boolean deleteTable(String tableName){
boolean status = false;
TableName name = TableName.valueOf(tableName);
try {
Admin admin = conf.getHconnection().getAdmin();
if(admin.tableExists(name)){
if(!admin.isTableDisabled(name)){
admin.disableTable(name);
}
admin.deleteTable(name);
}else{
LOG.warn(" HBase中不存在 表 " + tableName);
}
admin.close();
status = true;
} catch (MasterNotRunningException e) {
LOG.error("HBase master 未运行 。 ", e);
} catch (ZooKeeperConnectionException e) {
LOG.error("zooKeeper 连接异常。 ", e);
} catch (IOException e) {
LOG.error("", e);
}
return status;
}
/**
* 清空表
* @param tableName
* @return
*/
public static boolean truncateTable(String tableName){
boolean status = false;
TableName name = TableName.valueOf(tableName);
try {
Admin admin = conf.getHconnection().getAdmin();
if(admin.tableExists(name)){
if(admin.isTableAvailable(name)){
admin.disableTable(name);
}
admin.truncateTable(name, true);
}else{
LOG.warn(" HBase中不存在 表 " + tableName);
}
admin.close();
status = true;
} catch (MasterNotRunningException e) {
LOG.error("HBase master 未运行 。 ", e);
} catch (ZooKeeperConnectionException e) {
LOG.error("zooKeeper 连接异常。 ", e);
} catch (IOException e) {
LOG.error("", e);
}
return status;
}
/**
* 创建HBase表
* @param tableName
* @param cf 列族名
* @param inMemory
* @param ttl ttl < 0 则为永久保存
*/
public static boolean createTable(String tableName, String cf, boolean inMemory, int ttl, int maxVersion){
HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, COPROCESSORCLASSNAME);
return createTable(htd);
}
public static boolean createTable(String tableName, String cf, boolean inMemory, int ttl, int maxVersion, boolean useSNAPPY){
HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, useSNAPPY , COPROCESSORCLASSNAME);
return createTable(htd);
}
public static boolean createTable(String tableName, String cf, boolean inMemory, int ttl, int maxVersion, boolean useSNAPPY, byte[][] splits){
HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, useSNAPPY, COPROCESSORCLASSNAME);
return createTable(htd , splits);
}
/**
* @param tableName 表名
* @param cf 列簇
* @param inMemory 是否存在内存
* @param ttl 数据过期时间
* @param maxVersion 最大版本
* @param splits 分区
* @return
*/
public static boolean createTable(String tableName,
String cf,
boolean inMemory,
int ttl,
int maxVersion,
byte[][] splits){
//返回表说明
HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, COPROCESSORCLASSNAME);
//通过HTableDescriptor 和 splits 分区策略来定义表
return createTable(htd , splits);
}
public static List<String> listTables(){
List<String> list = new ArrayList<String>();
Admin admin = null;
try {
admin = conf.getHconnection().getAdmin();
TableName[] listTableNames = admin.listTableNames();
for( TableName t : listTableNames ){
list.add( t.getNameAsString() );
}
} catch(IOException e ) {
LOG.error("创建HBase表失败。", e);
}finally{
try {
if(admin!=null){
admin.close();
}
} catch (IOException e) {
LOG.error("", e);
}
}
return list;
}
/**
* 列出所有表
* @param reg
* @return
*/
public static List<String> listTables(String reg){
List<String> list = new ArrayList<String>();
Admin admin = null;
try {
admin = conf.getHconnection().getAdmin();
TableName[] listTableNames = admin.listTableNames(reg);
for(TableName t : listTableNames){
list.add(t.getNameAsString());
}
} catch(IOException e) {
LOG.error("创建HBase表失败。", e);
}finally{
try {
if(admin!=null){
admin.close();
}
} catch (IOException e) {
LOG.error("", e);
}
}
return list;
}
/**
* 创建HBase表
* @param tableName
* @param cf 列族名
* @param inMemory
* @param ttl ttl < 0 则为永久保存
*/
public static boolean createTable(String tableName, String cf, boolean inMemory, int ttl , int maxVersion, String ... coprocessorClassNames){
HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, coprocessorClassNames);
return createTable(htd);
}
public static boolean createTable( String tableName, String cf, boolean inMemory, int ttl, int maxVersion, boolean useSNAPPY, String ... coprocessorClassNames){
HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, useSNAPPY, coprocessorClassNames);
return createTable(htd);
}
public static boolean createTable( String tableName,String cf,boolean inMemory, int ttl ,int maxVersion , boolean useSNAPPY ,byte[][] splits, String ... coprocessorClassNames){
HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, useSNAPPY ,coprocessorClassNames);
return createTable(htd,splits );
}
public static boolean createTable(String tableName, String cf, boolean inMemory, int ttl, int maxVersion, byte[][] splits, String ... coprocessorClassNames){
HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, coprocessorClassNames);
return createTable(htd,splits );
}
/**
* 通过HTableDescriptor 和 分区 来构建hbase
* @param htd
* @param splits
* @return
*/
public static boolean createTable(HTableDescriptor htd, byte[][] splits){
Admin admin = null;
try {
admin = conf.getHconnection().getAdmin();
TableName tableName = htd.getTableName();
boolean exist = admin.tableExists(tableName);
if(exist){
LOG.error("表"+tableName.getNameAsString() + "已经存在");
}else{
//使用Admin进行创建表
admin.createTable(htd, splits);
}
} catch(IOException e ) {
LOG.error("创建HBase表失败。", e);
return false;
}finally{
try {
if(admin!=null){
admin.close();
}
} catch (IOException e) {
LOG.error("", e);
}
}
return true;
}
public static boolean createTable(HTableDescriptor htd){
Admin admin = null;
try {
admin = conf.getHconnection().getAdmin();
if(admin.tableExists(htd.getTableName())){
LOG.info("表" + htd.getTableName() + "已经存在");
}else{
admin.createTable(htd);
}
} catch(IOException e ) {
LOG.error("创建HBase表失败。", e);
return false;
}finally{
try {
if(admin!=null){
admin.close();
}
} catch (IOException e) {
LOG.error("", e);
}
}
return true;
}
/**
* 创建命名空间
* @param nameSpace
* @return
*/
public static boolean createNameSpace(String nameSpace){
Admin admin = null;
try {
admin = conf.getHconnection().getAdmin();
NamespaceDescriptor[] listNamespaceDescriptors = admin.listNamespaceDescriptors();
boolean exist = false;
for(NamespaceDescriptor namespaceDescriptor : listNamespaceDescriptors){
if(namespaceDescriptor.getName().equals(nameSpace)){
exist = true;
}
}
if(!exist) admin.createNamespace(NamespaceDescriptor.create(nameSpace).build());
} catch(IOException e ) {
LOG.error("创建HBase命名空间失败。", e);
return false;
}finally{
try {
if(admin!=null){
admin.close();
}
} catch (IOException e) {
LOG.error("", e);
}
}
return true;
}
/**
* 为 HBase中的表 tableName添加 协处理器 coprocessorClassName
* @param tableName
* @param coprocessorClassName 必须是已经存在与HBase集群中
* @return boolean
*/
public static boolean addCoprocessorClassForTable(String tableName,String coprocessorClassName){
boolean status = false;
TableName name = TableName.valueOf(tableName);
Admin admin = null;
try {
admin = conf.getHconnection().getAdmin();
HTableDescriptor htd = admin.getTableDescriptor(name);
if(!htd.hasCoprocessor(coprocessorClassName)){
htd.addCoprocessor(coprocessorClassName);
admin.disableTable(name);
admin.modifyTable(name, htd);
admin.enableTable(name);
}else{
LOG.warn(String.format("表 %s中已经存在协处理器%s", tableName, coprocessorClassName));
}
status = true;
} catch (MasterNotRunningException e) {
LOG.error("HBase master 未运行 。 ", e);
} catch (ZooKeeperConnectionException e) {
LOG.error("zooKeeper 连接异常。 ", e);
} catch (IOException e) {
LOG.error("", e);
}finally{
try {
if(admin!=null){
admin.close();
}
} catch (IOException e) {
LOG.error("", e);
}
}
return status;
}
/**
* 为HBase中的表 tableName添加指定位置的 协处理器 jar
* @param tableName
* @param coprocessorClassName jar中的具体的协处理器
* @param jarPath hdfs的路径
* @param level 执行级别
* @param kvs 运行参数 可以为 null
* @return boolean
*/
public static boolean addCoprocessorJarForTable(String tableName, String coprocessorClassName,String jarPath,int level ,Map<String, String> kvs ){
boolean status = false;
TableName name = TableName.valueOf(tableName);
Admin admin = null;
try {
admin = conf.getHconnection().getAdmin();
HTableDescriptor htd = admin.getTableDescriptor(name);
if(!htd.hasCoprocessor(coprocessorClassName)){
admin.disableTable(name);
htd.addCoprocessor(coprocessorClassName, new Path(jarPath), level, kvs);
admin.modifyTable(name, htd);
admin.enableTable(name);
}else{
LOG.warn(String.format("表 %s中已经存在协处理器%s", tableName, coprocessorClassName));
}
status = true;
} catch (MasterNotRunningException e) {
LOG.error("HBase master 未运行 。 ", e);
} catch (ZooKeeperConnectionException e) {
LOG.error("zooKeeper 连接异常。 ", e);
} catch (IOException e) {
LOG.error("", e);
}finally{
try {
if(admin!=null){
admin.close();
}
} catch (IOException e) {
LOG.error("", e);
}
}
return status;
}
/**
* @param tableName
* @param cf
* @param inMemory
* @param ttl
* @param maxVersion
* @param coprocessorClassNames
* @return
*/
public static HTableDescriptor createHTableDescriptor( String tableName,String cf,boolean inMemory, int ttl ,int maxVersion ,String ... coprocessorClassNames ){
return createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, true , COPROCESSORCLASSNAME);
}
/**
* @param tableName
* @param cf
* @param inMemory
* @param ttl
* @param maxVersion
* @param useSNAPPY
* @param coprocessorClassNames
* @return
*/
public static HTableDescriptor createHTableDescriptor( String tableName,String cf,boolean inMemory, int ttl ,int maxVersion , boolean useSNAPPY , String ... coprocessorClassNames ){
// 1.创建命名空间
String[] split = tableName.split(":");
if(split.length==2){
createNameSpace(split[0]);
}
// 2.添加协处理器
HTableDescriptor htd = new HTableDescriptor(TableName.valueOf(tableName));
for( String coprocessorClassName : coprocessorClassNames ){
try {
htd.addCoprocessor(coprocessorClassName);
} catch (IOException e1) {
LOG.error("为表" + tableName + " 添加协处理器失败。 ", e1);
}
}
// 创建HColumnDescriptor
HColumnDescriptor hcd = new HColumnDescriptor(cf);
if( maxVersion > 0 )
//定义最大版本号
hcd.setMaxVersions(maxVersion);
/**
* 设置布隆过滤器
* 默认是NONE 是否使用布隆过虑及使用何种方式
* 布隆过滤可以每列族单独启用
* Default = ROW 对行进行布隆过滤。
* 对 ROW,行键的哈希在每次插入行时将被添加到布隆。
* 对 ROWCOL,行键 + 列族 + 列族修饰的哈希将在每次插入行时添加到布隆
* 使用方法: create ‘table’,{BLOOMFILTER =>’ROW’}
* 启用布隆过滤可以节省读磁盘过程,可以有助于降低读取延迟
* */
hcd.setBloomFilterType(BloomType.ROWCOL);
/**
* hbase在LRU缓存基础之上采用了分层设计,整个blockcache分成了三个部分,分别是single、multi和inMemory。三者区别如下:
* single:如果一个block第一次被访问,放在该优先队列中;
* multi:如果一个block被多次访问,则从single队列转移到multi队列
* inMemory:优先级最高,常驻cache,因此一般只有hbase系统的元数据,如meta表之类的才会放到inMemory队列中。普通的hbase列族也可以指定IN_MEMORY属性,方法如下:
* create 'table', {NAME => 'f', IN_MEMORY => true}
* 修改上表的inmemory属性,方法如下:
* alter 'table',{NAME=>'f',IN_MEMORY=>true}
* */
hcd.setInMemory(inMemory);
hcd.setScope(1);
/**
* 数据量大,边压边写也会提升性能的,毕竟IO是大数据的最严重的瓶颈,
* 哪怕使用了SSD也是一样。众多的压缩方式中,推荐使用SNAPPY。从压缩率和压缩速度来看,
* 性价比最高。
**/
if(useSNAPPY)hcd.setCompressionType(Compression.Algorithm.SNAPPY);
//默认为NONE
//如果数据存储时设置了编码, 在缓存到内存中的时候是不会解码的,这样和不编码的情况相比,相同的数据块,编码后占用的内存更小, 即提高了内存的使用率
//如果设置了编码,用户必须在取数据的时候进行解码, 因此在内存充足的情况下会降低读写性能。
//在任何情况下开启PREFIX_TREE编码都是安全的
//不要同时开启PREFIX_TREE和SNAPPY
//通常情况下 SNAPPY并不能比 PREFIX_TREE取得更好的优化效果
//hcd.setDataBlockEncoding(DataBlockEncoding.PREFIX_TREE);
//默认为64k 65536
//随着blocksize的增大, 系统随机读的吞吐量不断的降低,延迟也不断的增大,
//64k大小比16k大小的吞吐量大约下降13%,延迟增大13%
//128k大小比64k大小的吞吐量大约下降22%,延迟增大27%
//对于随机读取为主的业务,可以考虑调低blocksize的大小
//随着blocksize的增大, scan的吞吐量不断的增大,延迟也不断降低,
//64k大小比16k大小的吞吐量大约增加33%,延迟降低24%
//128k大小比64k大小的吞吐量大约增加7%,延迟降低7%
//对于scan为主的业务,可以考虑调大blocksize的大小
//如果业务请求以Get为主,则可以适当的减小blocksize的大小
//如果业务是以scan请求为主,则可以适当的增大blocksize的大小
//系统默认为64k, 是一个scan和get之间取的平衡值
//hcd.setBlocksize(s)
//设置表中数据的存储生命期,过期数据将自动被删除,
// 例如如果只需要存储最近两天的数据,
// 那么可以设置setTimeToLive(2 * 24 * 60 * 60)
if( ttl < 0 ) ttl = HConstants.FOREVER;
hcd.setTimeToLive(ttl);
htd.addFamily( hcd);
return htd;
}
public static boolean createTable(HBaseTableParam param){
String nameSpace = param.getNameSpace();
if(!"default".equalsIgnoreCase(nameSpace)){
checkArgument(createNameSpace(nameSpace), String.format("创建命名空间%s失败。", nameSpace));
}
HTableDescriptor desc = createHTableDescriptor(param);
byte[][] splits = param.getSplits();
if(splits == null){
return createTable(desc);
}else{
return createTable(desc, splits);
}
}
public static HTableDescriptor createHTableDescriptor(HBaseTableParam param){
String tableName = String.format("%s:%s", param.getNameSpace(), param.getTableName());
HTableDescriptor htd = new HTableDescriptor(TableName.valueOf(tableName));
for(String coprocessorClassName : param.getCoprocessorClazz()){
try {
htd.addCoprocessor(coprocessorClassName);
} catch (IOException e) {
LOG.error(String.format("为表 %s 添加协处理器失败。", tableName), e);
}
}
HColumnDescriptor hcd = new HColumnDescriptor(param.getCf());
hcd.setBloomFilterType(param.getBloomType());
hcd.setMaxVersions(param.getMaxVersions());
hcd.setScope(param.getReplicationScope());
hcd.setBlocksize(param.getBlocksize());
hcd.setInMemory(param.isInMemory());
hcd.setTimeToLive(param.getTtl());
/* 数据量大,边压边写也会提升性能的,毕竟IO是大数据的最严重的瓶颈,哪怕使用了SSD也是一样。众多的压缩方式中,推荐使用SNAPPY。从压缩率和压缩速度来看,性价比最高。 */
if(param.isUsePrefix_tree())hcd.setDataBlockEncoding(DataBlockEncoding.PREFIX_TREE);
if(param.isUseSnappy())hcd.setCompressionType(Compression.Algorithm.SNAPPY);
htd.addFamily( hcd);
return htd;
}
public static void closeTable( Table table ){
if( table != null ){
try {
table.close();
} catch (IOException e) {
LOG.error(" ", e);
}
table = null;
}
}
public static byte[][] getSplitKeys() {
//String[] keys = new String[]{"50|"};
//String[] keys = new String[]{"25|","50|","75|"};
//String[] keys = new String[]{"13|","26|","39|", "52|","65|","78|","90|"};
String[] keys = new String[]{ "06|","13|","20|", "26|","33|", "39|","46|", "52|","58|", "65|","72|","78|", "84|","90|","95|"};
//String[] keys = new String[]{"10|", "20|", "30|", "40|", "50|", "60|", "70|", "80|", "90|"};
byte[][] splitKeys = new byte[keys.length][];
TreeSet<byte[]> rows = new TreeSet<byte[]>(Bytes.BYTES_COMPARATOR);//升序排序
for (int i = 0; i < keys.length; i++) {
rows.add(Bytes.toBytes(keys[i]));
}
Iterator<byte[]> rowKeyIter = rows.iterator();
int i = 0;
while (rowKeyIter.hasNext()) {
byte[] tempRow = rowKeyIter.next();
rowKeyIter.remove();
splitKeys[i] = tempRow;
i++;
}
return splitKeys;
}
public static class HBaseTableParam{
private final String nameSpace; //命名空间
private final String tableName; //表名
private final String cf; //列簇
private Set<String> coprocessorClazz = Sets.newHashSet("org.apache.hadoop.hbase.coprocessor.AggregateImplementation");
private int maxVersions = 1; //版本号 默认为1
private BloomType bloomType = BloomType.ROWCOL;
private boolean inMemory = false;
private int replicationScope = 1;
private boolean useSnappy = false; //默认不使用压缩
private boolean usePrefix_tree = false;
private int blocksize = 65536;
private int ttl = HConstants.FOREVER;
private byte[][] splits;
public HBaseTableParam(String nameSpace, String tableName, String cf) {
super();
this.nameSpace = nameSpace == null ? "default" : nameSpace;
this.tableName = tableName;
this.cf = cf;
}
public String getNameSpace() {
return nameSpace;
}
public String getTableName() {
return tableName;
}
public String getCf() {
return cf;
}
public Set<String> getCoprocessorClazz() {
return coprocessorClazz;
}
public void clearCoprocessor(){
coprocessorClazz.clear();
}
public void addCoprocessorClazz(String clazz) {
this.coprocessorClazz.add(clazz);
}
public void addCoprocessorClazz(String ... clazz) {
addCoprocessorClazz(Arrays.asList(clazz));
}
public void addCoprocessorClazz(Collection<String> clazz) {
this.coprocessorClazz.addAll(clazz);
}
public int getMaxVersions() {
return maxVersions;
}
public void setMaxVersions(int maxVersions) {
this.maxVersions = maxVersions <= 0 ? 1 : maxVersions;
}
public BloomType getBloomType() {
return bloomType;
}
public void setBloomType(BloomType bloomType) {
this.bloomType = bloomType == null ? BloomType.ROWCOL : bloomType;
}
public boolean isInMemory() {
return inMemory;
}
public void setInMemory(boolean inMemory) {
this.inMemory = inMemory;
}
public int getReplicationScope() {
return replicationScope;
}
public void setReplicationScope(int replicationScope) {
this.replicationScope = replicationScope < 0 ? 1 : replicationScope;
}
public boolean isUseSnappy() {
return useSnappy;
}
/**
* 控制是否使用 snappy 压缩数据, 默认是不启用
* @param useSnappy
*/
public void setUseSnappy(boolean useSnappy) {
this.useSnappy = useSnappy;
}
public boolean isUsePrefix_tree() {
return usePrefix_tree;
}
/**
* 控制是否使用数据编码,默认是不使用
*
* 如果数据存储时设置了编码, 在缓存到内存中的时候是不会解码的,这样和不编码的情况相比,相同的数据块,编码后占用的内存更小, 即提高了内存的使用率
* 如果设置了编码,用户必须在取数据的时候进行解码, 因此在内存充足的情况下会降低读写性能。
* 在任何情况下开启PREFIX_TREE编码都是安全的
* 不要同时开启PREFIX_TREE和SNAPPY
* 通常情况下 SNAPPY并不能比 PREFIX_TREE取得更好的优化效果
*/
public void setUsePrefix_tree(boolean usePrefix_tree) {
this.usePrefix_tree = usePrefix_tree;
}
public int getBlocksize() {
return blocksize;
}
/**
*默认为64k 65536
*随着blocksize的增大, 系统随机读的吞吐量不断的降低,延迟也不断的增大,
*64k大小比16k大小的吞吐量大约下降13%,延迟增大13%
*128k大小比64k大小的吞吐量大约下降22%,延迟增大27%
*对于随机读取为主的业务,可以考虑调低blocksize的大小
*
*随着blocksize的增大, scan的吞吐量不断的增大,延迟也不断降低,
*64k大小比16k大小的吞吐量大约增加33%,延迟降低24%
*128k大小比64k大小的吞吐量大约增加7%,延迟降低7%
*对于scan为主的业务,可以考虑调大blocksize的大小
*
*如果业务请求以Get为主,则可以适当的减小blocksize的大小
*如果业务是以scan请求为主,则可以适当的增大blocksize的大小
*系统默认为64k, 是一个scan和get之间取的平衡值
*
*/
public void setBlocksize(int blocksize) {
this.blocksize = blocksize <= 0 ? 65536 : blocksize;
}
public int getTtl() {
return ttl;
}
/**
* 默认是永久保存
* @param ttl 大于 零的整数, <= 0 ? tt 为 永久保存
*/
public void setTtl(int ttl) {
this.ttl = ttl <= 0 ? HConstants.FOREVER : ttl;
}
public byte[][] getSplits() {
return splits;
}
/*
* 预分区的rowKey范围配置
* @param splits
*/
/*
public void setSplits(byte[][] splits) {
this.splits = splits;
}*/
}
public static void main(String[] args) throws Exception{
Admin admin = conf.getHconnection().getAdmin();
System.out.println(admin);
//deleteTable("test:user");
// HBaseTableUtil.createTable("aaaaa","info1",true,-1,1);
// HBaseTableUtil.truncateTable("aaaaa");
/* boolean b = tableExists("test:user2");
Table table = getTable("test:user2");
System.out.println("=================="+table);
System.out.println("=================="+table.getName());*/
//HBaseTableUtil.deleteTable("aaaaa");
/* Table table = HBaseTableUtil.getTable("countform:typecount");
System.out.println(table);*/
/*
boolean b = HBaseTableUtil.tableExists("countform:typecount");
System.out.println(b);*/
HBaseTableUtil.deleteTable("tanslator");
HBaseTableUtil.deleteTable("ability");
HBaseTableUtil.deleteTable("task");
HBaseTableUtil.deleteTable("paper");
// HbaseSearchService hbaseSearchService=new HbaseSearchService();
// Map<String, String> stringStringMap = hbaseSearchService.get("countform:bsid","", new BaseMapRowExtrator());
// Map<String, String> aaaaa = hbaseSearchService.get("countform:bsid", "aaaaa", new BaseMapRowExtrator());
// System.out.println(aaaaa);
}
}
com/hsiehchou/hbase/entity/AbstractRow.java
package com.hsiehchou.hbase.entity;
import com.google.common.collect.HashMultimap;
import com.google.common.collect.Sets;
import java.util.Collection;
import java.util.Map;
import java.util.Set;
public abstract class AbstractRow<T extends HBaseCell> {
protected String rowKey;
protected HashMultimap<String, T> cells;
protected Set<String> fields;
protected long maxCapTime;
public AbstractRow(String rowKey){
this.rowKey = rowKey;
cells = HashMultimap.create();
fields = Sets.newHashSet();
}
public boolean addCell(String field, String value, long capTime){
return addCell(field, createCell(field, value, capTime));
}
public boolean addCell(String field, T cell){
fields.add(cell.getField());
if(cell.getCapTime() > maxCapTime)
maxCapTime = cell.getCapTime();
return cells.put(field, cell);
}
public boolean[] addCell(String field, Collection<T> cells){
boolean[] status = new boolean[cells.size()];
int n = 0;
for(T cell : cells){
status[n] = addCell(field, cell);
n++;
}
return status;
}
public String getRowKey() {
return rowKey;
}
protected abstract T createCell(String field, String value, long capTime);
public Map<String, Collection<T>> getCell() {
return cells.asMap();
}
public Collection<T> getCellByField(String field){
return cells.get(field);
}
public Set<Map.Entry<String, T>> entries(){
return cells.entries();
}
@Override
public String toString() {
return "AbstractRow [rowKey=" + rowKey + ", cells=" + cells + "]";
}
public boolean equals(Object obj) {
if(this == obj)return true ;
if(!(obj instanceof AbstractRow))return false ;
@SuppressWarnings("unchecked")
AbstractRow<T> row = (AbstractRow<T>) obj;
if(rowKey.equals(row.getRowKey()))return true;
return false;
}
public int hashCode(){
return this.rowKey.hashCode();
}
public long getMaxCapTime() {
return maxCapTime;
}
public Set<String> getFields() {
return Sets.newHashSet(fields);
}
}
com/hsiehchou/hbase/entity/HBaseCell.java
package com.hsiehchou.hbase.entity;
public class HBaseCell implements Comparable<HBaseCell>{
protected String field;
protected String value;
protected Long capTime;
public HBaseCell(String field, String value, long capTime){
this.field = field;
this.capTime = capTime;
this.value = value;
}
public String getField(){
return field;
}
public String getValue(){
return value;
}
public void setCapTime(long capTime) {
this.capTime = capTime;
}
public Long getCapTime() {
return capTime;
}
public String toString(){
return String.format("%s_[%s]_%s", field, capTime, value);
}
public int compareTo(HBaseCell o) {
return o.getCapTime().compareTo(this.capTime);
}
public boolean equals(Object obj) {
if(this == obj)return true ;
if(!(obj instanceof HBaseCell))return false ;
HBaseCell cell = (HBaseCell)obj;
if(field.equals(cell.getField()) && value.equals(cell.getValue())){
if(cell.getCapTime() < capTime){
cell.setCapTime(this.capTime);
}
return true;
}
return false;
}
public int hashCode(){
return this.field.hashCode() + 31*this.value.hashCode();
}
}
com/hsiehchou/hbase/entity/HBaseRow.java
package com.hsiehchou.hbase.entity;
public class HBaseRow extends AbstractRow<HBaseCell> {
public HBaseRow(String rowKey){
super(rowKey);
}
public boolean[] addCell(String field, HBaseCell ... cells){
boolean[] status = new boolean[cells.length];
for(int i = 0; i < cells.length; i++){
status[i] = addCell(field, cells[i]);
}
return status;
}
protected HBaseCell createCell(String field, String value, long capTime) {
return new HBaseCell(field, value, capTime);
}
}
com/hsiehchou/hbase/extractor/BaseListRowExtrator.java
package com.hsiehchou.hbase.extractor;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class BaseListRowExtrator implements RowExtractor<List<String>>{
private List<String> row;
public Long lastcjtime = 0l;
public Long firstcjtime = 0l;
@Override
public List<String> extractRowData(Result result, int rowNum)
throws IOException {
row = new ArrayList<String>();
for(Cell cell : result.listCells()) {
String column = Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength());
String value = Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
if(column.equalsIgnoreCase("cjtime")) {
Long v = Long.parseLong(value);
if(lastcjtime < v) {
lastcjtime = v;
}else if(firstcjtime > v) {
firstcjtime = v;
}
}
row.add(value);
}
return row;
}
}
com/hsiehchou/hbase/extractor/BaseMapRowExtrator.java
package com.hsiehchou.hbase.extractor;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class BaseMapRowExtrator implements RowExtractor<Map<String,String>> {
private Map<String,String> row;
private List<byte[]> rows;
private String longTimeField;
private SimpleDateFormat format;
private String field;
private String value;
private long time;
public BaseMapRowExtrator(){}
/**
* @param rows 需要提取 所有的 rowKey , null 则不提取
*/
public BaseMapRowExtrator(List<byte[]> rows){
this.rows = rows;
}
/**
* @param rows 需要提取 所有的 rowKey , null 则不提取
* @param longTimeField long类型的时间字段 表示需要将其转换称 String 类型
*/
public BaseMapRowExtrator(List<byte[]> rows,String longTimeField){
this.rows = rows;
this.longTimeField = longTimeField;
}
/**
* @param rows 需要提取 所有的 rowKey , null 则不提取
* @param longTimeField long类型的时间字段
* @param timePattern 表示需要已该指定的格式 将时间字段的值转换成字符串
*/
public BaseMapRowExtrator(List<byte[]> rows,String longTimeField,String timePattern){
this.rows = rows;
this.longTimeField = longTimeField;
if(StringUtils.isNotBlank(timePattern)){
format = new SimpleDateFormat(timePattern);
}
}
public Map<String, String> extractRowData(Result result, int rowNum) throws IOException {
row = new HashMap<String,String>();
List<Cell> cells = result.listCells();
for(Cell cell : cells) {
field = Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength());
if( field.equals(longTimeField) ){
time = Bytes.toLong(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
if( format != null ){
value = format.format(new Date(time));
}else{
value = String.valueOf(time);
}
}else{
value = Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
}
row.put(field,value);
}
if( rows != null ){
rows.add(result.getRow());
}
return row;
}
}
com/hsiehchou/hbase/extractor/BaseMapWithRowKeyExtrator.java
package com.hsiehchou.hbase.extractor;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
public class BaseMapWithRowKeyExtrator implements RowExtractor<Map<String,String>> {
private Map<String,String> row;
/* (non-Javadoc)
* @see com.bh.d406.bigdata.hbase.extractor.RowExtractor#extractRowData(org.apache.hadoop.hbase.client.Result, int)
*/
@Override
public Map<String, String> extractRowData(Result result, int rowNum)
throws IOException {
row = new HashMap<String,String>();
row.put("rowKey", Bytes.toString( result.getRow() ));
for(Cell cell : result.listCells()) {
row.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()),Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
return row;
}
}
com/hsiehchou/hbase/extractor/BeanRowExtrator.java
package com.hsiehchou.hbase.extractor;
import com.google.common.collect.Maps;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.lang.reflect.Field;
import java.util.Map;
public class BeanRowExtrator<T> implements RowExtractor<T> {
private static final Logger LOG = LoggerFactory.getLogger(BeanRowExtrator.class);
private Class<T> clazz;
private Map<String,Field> fieldMap;
public BeanRowExtrator(Class<T> clazz){
this.clazz = clazz;
this.fieldMap = getDeclaredFields(clazz);
}
public T extractRowData(Result result, int rowNum) throws IOException {
return resultReflectToClass(result, rowNum);
}
private T resultReflectToClass(Result result, int rowNum){
String column = null;
Field field = null;
T obj = null;
try {
obj = clazz.newInstance();
for(Cell cell : result.listCells()){
column = Bytes.toString(cell.getQualifierArray(),
cell.getQualifierOffset(), cell.getQualifierLength());
/*检查该列是否在实体类中存在对应的属性,若存在则 为其赋值*/
if((field = fieldMap.get(column.toLowerCase())) != null){
field.set(obj, Bytes.toString(cell.getValueArray(),
cell.getValueOffset(), cell.getValueLength()));
}
}
} catch (InstantiationException e) {
LOG.error(String.format("解析第%个满足条件的记录%s失败。", rowNum, result), e);
} catch (IllegalAccessException e) {
LOG.error(String.format("解析第%s个满足条件的记录%s失败。", rowNum, result), e);
}
return obj;
}
private Map<String,Field> getDeclaredFields(Class<?> clazz){
Field[] fields = clazz.getDeclaredFields();
Field field = null;
Map<String,Field> fieldMap = Maps.newHashMapWithExpectedSize(fields.length);
for(int i = 0; i < fields.length; i++){
field = fields[i];
if(field.getModifiers() == 2){
field.setAccessible(true);
fieldMap.put(field.getName().toLowerCase(), field);
}
}
fields = null;
return fieldMap;
}
}
com/hsiehchou/hbase/extractor/CellNumExtrator.java
package com.hsiehchou.hbase.extractor;
import org.apache.hadoop.hbase.client.Result;
import java.io.IOException;
public class CellNumExtrator implements RowExtractor<Integer> {
public Integer extractRowData(Result result, int rowNum) throws IOException {
return result.listCells().size();
}
}
com/hsiehchou/hbase/extractor/MapLongRowExtrator.java
package com.hsiehchou.hbase.extractor;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
public class MapLongRowExtrator implements RowExtractor<Map<String,Long>> {
private Map<String,Long> row;
@Override
public Map<String, Long> extractRowData(Result result, int rowNum) throws IOException {
row = new HashMap<String,Long>();
for(Cell cell : result.listCells()) {
row.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()),Bytes.toLong(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
return row;
}
}
com/hsiehchou/hbase/extractor/MapRowExtrator.java
package com.hsiehchou.hbase.extractor;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
import java.io.Serializable;
import java.util.HashMap;
import java.util.Map;
public class MapRowExtrator implements RowExtractor<Map<String,String>>,Serializable {
private static final long serialVersionUID = 1543027485077396235L;
private Map<String,String> row;
/* (non-Javadoc)
* @see com.bh.d406.bigdata.hbase.extractor.RowExtractor#extractRowData(org.apache.hadoop.hbase.client.Result, int)
*/
@Override
public Map<String, String> extractRowData(Result result, int rowNum) throws IOException {
row = new HashMap<String,String>();
for(Cell cell : result.listCells()) {
row.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()),Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
return row;
}
}
com/hsiehchou/hbase/extractor/MultiVersionRowExtrator.java
package com.hsiehchou.hbase.extractor;
import com.hsiehchou.hbase.entity.HBaseRow;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
public class MultiVersionRowExtrator implements RowExtractor<HBaseRow>{
private HBaseRow row;
public HBaseRow extractRowData(Result result, int rowNum) throws IOException {
row = new HBaseRow(Bytes.toString(result.getRow()));
String field = null;
String value = null;
long capTime = 0L;
for(Cell cell : result.listCells()){
field = Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength());
value = Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
capTime = cell.getTimestamp();
row.addCell(field, value, capTime);
}
return row ;
}
}
com/hsiehchou/hbase/extractor/OneColumnRowByteExtrator.java
package com.hsiehchou.hbase.extractor;
import org.apache.hadoop.hbase.client.Result;
import java.io.IOException;
import java.io.Serializable;
public class OneColumnRowByteExtrator implements RowExtractor<byte[]> ,Serializable{
private static final long serialVersionUID = -3420092335124240222L;
private byte[] cf;
private byte[] cl;
public OneColumnRowByteExtrator( byte[] cf,byte[] cl ){
this.cf = cf;
this.cl = cl;
}
public byte[] extractRowData(Result result, int rowNum) throws IOException {
return result.getValue(cf, cl);
}
}
com/hsiehchou/hbase/extractor/OneColumnRowStringExtrator.java
package com.hsiehchou.hbase.extractor;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
import java.io.Serializable;
public class OneColumnRowStringExtrator implements RowExtractor<String> , Serializable{
private static final long serialVersionUID = -8585637277902568648L;
private byte[] cf ;
private byte[] cl ;
public OneColumnRowStringExtrator( byte[] cf , byte[] cl ){
this.cf = cf;
this.cl = cl;
}
/* (non-Javadoc)
* @see com.bh.d406.bigdata.hbase.extractor.RowExtractor#extractRowData(org.apache.hadoop.hbase.client.Result, int)
*/
@Override
public String extractRowData(Result result, int rowNum) throws IOException {
byte[] value = result.getValue(cf, cl);
if( value == null ) return null;
return Bytes.toString( value ) ;
}
}
com/hsiehchou/hbase/extractor/OnlyRowKeyExtrator.java
package com.hsiehchou.hbase.extractor;
import org.apache.hadoop.hbase.client.Result;
import java.io.IOException;
public class OnlyRowKeyExtrator implements RowExtractor<byte[]> {
@Override
public byte[] extractRowData(Result result, int rowNum) throws IOException {
// TODO Auto-generated method stub
return result.getRow();
}
}
com/hsiehchou/hbase/extractor/OnlyRowKeyStringExtrator.java
package com.hsiehchou.hbase.extractor;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
public class OnlyRowKeyStringExtrator implements RowExtractor<String> {
public String extractRowData(Result result, int rowNum) throws IOException {
return Bytes.toString( result.getRow() );
}
}
com/hsiehchou/hbase/extractor/RowExtractor.java
package com.hsiehchou.hbase.extractor;
import org.apache.hadoop.hbase.client.Result;
import java.io.IOException;
public interface RowExtractor<T> {
/**
* description:
* @param result result解析器
* @param rowNum
* @return
* @throws Exception
* T
*/
T extractRowData(Result result, int rowNum) throws IOException;
}
com/hsiehchou/hbase/extractor/SingleColumnMultiVersionRowExtrator.java
package com.hsiehchou.hbase.extractor;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
import java.util.Set;
public class SingleColumnMultiVersionRowExtrator implements RowExtractor<Set<String>>{
private Set<String> values;
private byte[] cf;
private byte[] cl;
/**
* 单列解析器 获取hbase 单列多版本数据
* @param cf 列簇
* @param cl 列
* @param values 返回值
*/
public SingleColumnMultiVersionRowExtrator(byte[] cf, byte[] cl, Set<String> values){
this.cf = cf;
this.cl = cl;
this.values = values;
}
public Set<String> extractRowData(Result result, int rowNum) throws IOException {
for(Cell cell : result.getColumnCells(cf, cl)){
values.add(Bytes.toString(cell.getValueArray(),cell.getValueOffset(), cell.getValueLength()));
}
return values;
}
}
com/hsiehchou/hbase/extractor/StrToByteExtrator.java
package com.hsiehchou.hbase.extractor;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
import java.io.Serializable;
import java.util.HashMap;
import java.util.Map;
public class StrToByteExtrator implements RowExtractor<Map<String,byte[]>> ,Serializable {
private static final long serialVersionUID = 4633698173362569711L;
private Map<String,byte[]> row;
@Override
public Map<String, byte[]> extractRowData(Result result, int rowNum) throws IOException {
row = new HashMap<String,byte[]>();
for(Cell cell : result.listCells()) {
row.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()),
Bytes.copy(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
return row;
}
}
com/hsiehchou/hbase/extractor/ToRowList.java
package com.hsiehchou.hbase.extractor;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
/**
* Hbase数据库中数据提取接口实现:
* 提取result的rowKey,和每个cell的值作为一行数据,
* 一个cell=(row, family:qualifier:value, version)
*
* <p>
* 每行数据的格式为:{rowKey column${separator}value column${separator}value ...}
* 其中,不同的列之间用空格分隔,同样列元素的描述符与值之间用${separator}分隔
*/
public class ToRowList implements RowExtractor<List<String>> {
private Boolean currentVersion; //currentVersion为true:只取当前最新版本,false:取所有版本
private char separator; //不同元素之间拼接时的分隔符,默认为`#`
private ToRowList(Boolean currentVersion, char separator) {
this.separator = separator;
this.currentVersion = currentVersion;
}
public ToRowList(Boolean currentVersion) {
this(currentVersion, '#');
}
public ToRowList() {
this(true, '#');
}
/**
* 对{当前版本}存放在list[0] = {rowKey` `column`#`value` `column`#`value ...}
* 多版本的时候list({rowKey`#`version1` `column`#`value` `column`#`value ...},
* {rowKey`#`version2` `column`#`value` `column`#`value ...})
*/
@Override
public List<String> extractRowData(Result result, int rowNum) throws IOException {
if(result == null || result.isEmpty()) return null;
final char SPACE = ' ';
List<String> rows = new LinkedList<>();
//一个result是同一个rowKey的所有cells集合
String rowKey = Bytes.toString(result.getRow());
//build rowKey` `column`#`value` `column`#`value ...
StringBuilder row = new StringBuilder();
row.append(rowKey).append(SPACE);
//用于处理不同版本的映射
Map<Long, String> version2qualifiersAndValues = new HashMap<>();
List<Cell> cells = result.listCells();
for (Cell cell : cells) {
String value = Bytes.toString(cell.getValueArray(),
cell.getValueOffset(), cell.getValueLength());
String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
if (currentVersion) {
row.append(qualifier).append(separator).append(value).append(SPACE);
} else {
Long version = cell.getTimestamp();
String tmp = version2qualifiersAndValues.get(version);
version2qualifiersAndValues.put(version,
StringUtils.isNotBlank(tmp) ? tmp + " " + qualifier + separator + value
: rowKey + separator + version + " " + qualifier + separator + value);
}
}
if (currentVersion) {
rows.add(row.toString());
} else {
for (String v : version2qualifiersAndValues.values()) {
rows.add(v);
}
}
return rows;
}
}
com/hsiehchou/hbase/extractor/ToRowMap.java
package com.hsiehchou.hbase.extractor;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
/**
* currentVersion 标识是否取多版本的数据,默认取当前版本
* 对当前版本,返回row`#`qualifier->value的映射
* 对多个版本,返回row`#`version`#`qualifier->value的映射
*/
public class ToRowMap implements RowExtractor<Map<String, String>> {
private Boolean currentVersion;
public ToRowMap() {
this(true);
}
private ToRowMap(Boolean currentVersion) {
this.currentVersion = currentVersion;
}
@Override
public Map<String, String> extractRowData(Result result, int rowNum)
throws IOException {
if(result == null || result.isEmpty()) return null;
final char HashTag = '#';
HashMap<String, String> col2value = new HashMap<>();
String rowKey = Bytes.toString(result.getRow());
for (Cell cell : result.listCells()) {
String value = Bytes.toString(cell.getValueArray(),
cell.getValueOffset(), cell.getValueLength());
String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
if (currentVersion)
col2value.put(rowKey + HashTag + qualifier, value);
else {
long version = cell.getTimestamp();
col2value.put(rowKey + HashTag + version + HashTag + qualifier, value);
}
}
return col2value;
}
}
com/hsiehchou/hbase/insert/HBaseInsertException.java
package com.hsiehchou.hbase.insert;
import java.util.Iterator;
public class HBaseInsertException extends Exception{
public HBaseInsertException(String message) {
super(message);
}
public final synchronized void addSuppresseds(Iterable<Exception> exceptions){
if(exceptions != null){
Iterator<Exception> iterator = exceptions.iterator();
while (iterator.hasNext()){
addSuppressed(iterator.next());
}
}
}
}
com/hsiehchou/hbase/insert/HBaseInsertHelper.java
package com.hsiehchou.hbase.insert;
import com.hsiehchou.hbase.config.HBaseTableUtil;
import com.google.common.collect.Lists;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
/**
* 添加HBASE 插入数据类
*/
public class HBaseInsertHelper implements Serializable{
private HBaseInsertHelper(){}
public static void put(String tableName, Put put) throws Exception {
put(tableName, Lists.newArrayList(put));
}
public static void put(String tableName, List<Put> puts) throws Exception {
if(!puts.isEmpty()){
Table table = HBaseTableUtil.getTable(tableName);
try {
table.put(puts);
}catch (Exception e){
e.printStackTrace();
}finally {
HBaseTableUtil.close(table);
}
}
}
public static void put(final String tableName, List<Put> puts, int perThreadPutSize) throws Exception {
int size = puts.size();
if(size > perThreadPutSize){
int threadNum = (int)Math.ceil(size / (double)perThreadPutSize);
ExecutorService executorService = Executors.newFixedThreadPool(threadNum);
final CountDownLatch cdl = new CountDownLatch(threadNum);
final List<Exception> es = Collections.synchronizedList(new ArrayList<Exception>());
try {
for(int i = 0; i < threadNum; i++){
final List<Put> tmp;
if(i == (threadNum - 1)){
tmp = puts.subList(perThreadPutSize*i, size);
}else{
tmp = puts.subList(perThreadPutSize*i, perThreadPutSize*(i + 1));
}
executorService.execute(new Runnable() {
public void run() {
try {
if(es.isEmpty()) put(tableName, tmp);
} catch (Exception e) {
es.add(e);
}finally {
cdl.countDown();
}
}
});
}
cdl.await();
}finally {
executorService.shutdown();
}
if(es.size() > 0){
HBaseInsertException insertException = new HBaseInsertException(String.format("put数据到表%s失败。"));
insertException.addSuppresseds(es);
throw insertException;
}
}else {
put(tableName, puts);
}
}
public static void checkAndPut(String tableName, byte[] row, byte[] family, byte[] qualifier,
byte[] value, Put put) throws Exception {
checkAndPut(tableName, row, family, qualifier, null, value, put);
}
public static void checkAndPut(String tableName, byte[] row, byte[] family, byte[] qualifier,
CompareOp compareOp, byte[] value, Put put) throws Exception {
if(!put.isEmpty() ){
Table table = HBaseTableUtil.getTable(tableName);
try {
if(compareOp == null){
table.checkAndPut(row, family, qualifier, value, put);
}else{
table.checkAndPut(row, family, qualifier, compareOp, value, put);
}
}finally{
HBaseTableUtil.close(table);
}
}
}
}
com/hsiehchou/hbase/search/HBaseSearchService.java
package com.hsiehchou.hbase.search;
import com.hsiehchou.hbase.extractor.RowExtractor;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Scan;
import java.io.IOException;
import java.util.List;
import java.util.Map;
public interface HBaseSearchService {
/**
* 根据 用户 给定的解析类 解析 查询结果
* @param tableName
* @param scan
* @param extractor 用户自定义的 结果解析 类
* @return
* @throws IOException
* List<T>
*/
<T> List<T> search(String tableName, Scan scan, RowExtractor<T> extractor) throws IOException;
/**
* 当存在多个 scan时 采用多线程查询
* @param tableName
* @param scans
* @param extractor 用户自定义的 结果解析 类
* @return
* @throws IOException
* List<T>
*/
<T> List<T> searchMore(String tableName, List<Scan> scans, RowExtractor<T> extractor) throws IOException;
/**
* 采用多线程 同时查询多个表
* @param more
* @return
* @throws IOException
* List<T>
*/
<T> Map<String,List<T>> searchMore(List<SearchMoreTable<T>> more) throws IOException;
/**
* 利用反射 自动封装实体类
* @param tableName
* @param scan
* @param cls HBase表对应的实体类,属性只包含对应表的 列 , 不区分大小写
* @return
* @throws IOException
* @throws InstantiationException
* @throws IllegalAccessException
* List<T>
*/
<T> List<T> search(String tableName, Scan scan, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException;
/**
* 当存在多个 scan 时 采用多线程查询
* @param tableName
* @param scans
* @param cls HBase表对应的实体类,属性只包含对应表的 列 , 不区分大小写
* @return
* @throws IOException
* @throws InstantiationException
* @throws IllegalAccessException
* List<T>
*/
<T> List<T> searchMore(String tableName, List<Scan> scans, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException;
/**
* 批量 get 查询 并按自定义的方式解析结果集
* @param tableName
* @param gets
* @param extractor 用户自定义的 结果解析 类
* @return
* @throws IOException
* List<T>
*/
<T> List<T> search(String tableName, List<Get> gets, RowExtractor<T> extractor) throws IOException;
/**
* 多线程批量get, 并按自定义的方式解析结果集
* 建议 : perThreadExtractorGetNum >= 100
* @param tableName
* @param gets
* @param perThreadExtractorGetNum 每个线程处理的 get的个数
* @param extractor 用户自定义的 结果解析 类
* @return
* @throws IOException
* List<T>
*/
<T> List<T> searchMore(String tableName, List<Get> gets, int perThreadExtractorGetNum, RowExtractor<T> extractor) throws IOException;
/**
* 批量 get 查询 并利用反射 封装到指定的实体类中
* @param tableName
* @param gets
* @param cls HBase表对应的实体类,属性只包含对应表的 列 , 不区分大小写
* @return
* @throws IOException
* @throws InstantiationException
* List<T>
*/
<T> List<T> search(String tableName, List<Get> gets, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException;
/**
* 多线程批量 get 查询 并利用反射 封装到指定的实体类中
* 建议 : perThreadExtractorGetNum >= 100
* @param tableName
* @param gets
* @param perThreadExtractorGetNum 每个线程处理的 get的个数
* @param cls HBase表对应的实体类,属性只包含对应表的 列 , 不区分大小写
* @return
* @throws IOException
* @throws InstantiationException
* @throws IllegalAccessException
* List<T>
*/
<T> List<T> searchMore(String tableName, List<Get> gets, int perThreadExtractorGetNum, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException;
/**
* get 查询 并按自定义的方式解析结果集
* @param tableName
* @param extractor 用户自定义的 结果解析 类
* @return 如果 查询不到 则 返回 null
* @throws IOException
* List<T>
*/
<T> T search(String tableName, Get get, RowExtractor<T> extractor) throws IOException;
/**
* get 查询 并利用反射 封装到指定的实体类中
* @param tableName
* @param cls HBase表对应的实体类,属性只包含对应表的 列 , 不区分大小写
* @return 如果 查询不到 则 返回 null
* @throws IOException
* @throws InstantiationException
* List<T>
*/
<T> T search(String tableName, Get get, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException;
}
com/hsiehchou/hbase/search/HBaseSearchServiceImpl.java
package com.hsiehchou.hbase.search;
import com.hsiehchou.hbase.config.HBaseTableFactory;
import com.hsiehchou.hbase.extractor.RowExtractor;
import org.apache.hadoop.hbase.client.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
public class HBaseSearchServiceImpl implements HBaseSearchService,Serializable{
private static final long serialVersionUID = -8657479861137115645L;
private static final Logger LOG = LoggerFactory.getLogger(HBaseSearchServiceImpl.class);
private HBaseTableFactory factory = new HBaseTableFactory();
private int poolCapacity = 6;
@Override
public <T> List<T> search(String tableName, Scan scan, RowExtractor<T> extractor) throws IOException {
return null;
}
@Override
public <T> List<T> searchMore(String tableName, List<Scan> scans, RowExtractor<T> extractor) throws IOException {
return null;
}
@Override
public <T> Map<String, List<T>> searchMore(List<SearchMoreTable<T>> more) throws IOException {
return null;
}
@Override
public <T> List<T> search(String tableName, Scan scan, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException {
return null;
}
@Override
public <T> List<T> searchMore(String tableName, List<Scan> scans, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException {
return null;
}
@Override
public <T> List<T> search(String tableName, List<Get> gets, RowExtractor<T> extractor) throws IOException {
List<T> data = new ArrayList<T>();
search(tableName, gets, extractor,data);
return data;
}
@Override
public <T> List<T> searchMore(String tableName, List<Get> gets, int perThreadExtractorGetNum, RowExtractor<T> extractor) throws IOException {
return null;
}
@Override
public <T> List<T> search(String tableName, List<Get> gets, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException {
return null;
}
@Override
public <T> List<T> searchMore(String tableName, List<Get> gets, int perThreadExtractorGetNum, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException {
return null;
}
@Override
public <T> T search(String tableName, Get get, RowExtractor<T> extractor) throws IOException {
T obj = null;
List<T> res = search(tableName,Arrays.asList(get),extractor);
if( !res.isEmpty()){
obj = res.get(0);
}
return obj;
}
@Override
public <T> T search(String tableName, Get get, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException {
return null;
}
private <T> void search(String tableName, List<Get> gets,
RowExtractor<T> extractor , List<T> data ) throws IOException {
//根据table名获取表连接
Table table = factory.getHBaseTableInstance(tableName);
if(table != null ){
Result[] results = table.get(gets);
int n = 0;
T row = null;
for( Result result : results){
if( !result.isEmpty() ){
row = extractor.extractRowData(result, n);
if(row != null )data.add(row);
n++;
}
}
close( table, null);
}else{
throw new IOException(" table " + tableName + " is not exists ..");
}
}
public static boolean existsRowkey( Table table, String rowkey){
boolean exists =true;
try {
exists = table.exists(new Get(rowkey.getBytes()));
} catch (IOException e) {
LOG.error("失败。", e );
}
return exists;
}
public static void close( Table table, ResultScanner scanner ){
try {
if( table != null ){
table.close();
table = null;
}
if( scanner != null ){
scanner.close();
scanner = null;
}
} catch (IOException e) {
LOG.error("关闭 HBase的表 " + table.getName().toString() + " 失败。", e );
}
}
}
com/hsiehchou/hbase/search/SearchMoreTable.java
package com.hsiehchou.hbase.search;
import com.hsiehchou.hbase.extractor.RowExtractor;
import org.apache.hadoop.hbase.client.Scan;
public class SearchMoreTable<T> {
private String tableName;
private Scan scan;
private RowExtractor<T> extractor;
public SearchMoreTable() {
super();
}
public SearchMoreTable(String tableName, Scan scan,
RowExtractor<T> extractor) {
super();
this.tableName = tableName;
this.scan = scan;
this.extractor = extractor;
}
public String getTableName() {
return tableName;
}
public void setTableName(String tableName) {
this.tableName = tableName;
}
public Scan getScan() {
return scan;
}
public void setScan(Scan scan) {
this.scan = scan;
}
public RowExtractor<T> getExtractor() {
return extractor;
}
public void setExtractor(RowExtractor<T> extractor) {
this.extractor = extractor;
}
}
com/hsiehchou/hbase/spilt/SpiltRegionUtil.java
package com.hsiehchou.hbase.spilt;
import org.apache.hadoop.hbase.util.Bytes;
import java.util.Iterator;
import java.util.TreeSet;
/**
* hbase 预分区
*/
public class SpiltRegionUtil {
/**
* 定义分区
* @return
*/
public static byte[][] getSplitKeysBydinct() {
String[] keys = new String[]{"1","2", "3","4", "5","6", "7","8", "9","a","b", "c","d","e","f"};
//String[] keys = new String[]{"10|", "20|", "30|", "40|", "50|", "60|", "70|", "80|", "90|"};
byte[][] splitKeys = new byte[keys.length][];
//通过treeset排序
TreeSet<byte[]> rows = new TreeSet<byte[]>(Bytes.BYTES_COMPARATOR);//升序排序
for (int i = 0; i < keys.length; i++) {
rows.add(Bytes.toBytes(keys[i]));
}
Iterator<byte[]> rowKeyIter = rows.iterator();
int i = 0;
while (rowKeyIter.hasNext()) {
byte[] tempRow = rowKeyIter.next();
rowKeyIter.remove();
splitKeys[i] = tempRow;
i++;
}
return splitKeys;
}
}
6、执行
spark-submit --
master local[1] --
num-executors 1 --
driver-memory 300m --
executor-memory 500m --
executor-cores 1 --
jars $(echo /usr/chl/spark7/jars/*.jar | tr ‘ ‘ ‘,’) --
class com.hsiehchou.spark.streaming.kafka.kafka2hbase.DataRelationStreaming /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar
7、执行截图
十二、SpringCloud 项目构建
解决IntelliJ IDEA 创建Maven项目速度慢问题
add Maven Property
Name:archetypeCatalog
Value:internal
1、构建SpringCloud父项目
在原项目下新建 xz_bigdata_springcloud_dir目录
2、在此目录下新建 xz_bigdata_springclod_root项目
3、 引入SpringCloud依赖
父pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<modules>
<module>xz_bigdata_springcloud_common</module>
<module>xz_bigdata_springcloud_esquery</module>
<module>xz_bigdata_springcloud_eureka</module>
<module>xz_bigdata_springcloud_hbasequery</module>
</modules>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.0.9.RELEASE</version>
</parent>
<groupId>com.hsiehchou.springcloud</groupId>
<artifactId>xz_bigdata_springcloud_root</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>pom</packaging>
<name>xz_bigdata_springcloud_root</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<!--CDH源-->
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<!--依赖管理,用于管理spring-cloud的依赖-->
<dependencyManagement>
<dependencies>
<!--spring-cloud-dependencies-->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Finchley.SR1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<!--打包插件-->
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
</plugins>
</build>
</project>
删除父项目src目录。因为这个项目主要是管理子项目不做任何逻辑业务
4、构建SpringCloud Common子项目
新建子模块
xz_bigdata_springcloud_common
引入依赖
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>xz_bigdata_springcloud_root</artifactId>
<groupId>com.hsiehchou.springcloud</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>xz_bigdata_springcloud_common</artifactId>
<name>xz_bigdata_springcloud_common</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
<!--eureka-server-->
<!-- https://mvnrepository.com/artifact/org.springframework.cloud/spring-cloud-starter-eureka-server -->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-netflix-eureka-server</artifactId>
<exclusions>
<exclusion>
<artifactId>HdrHistogram</artifactId>
<groupId>org.hdrhistogram</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.24</version>
</dependency>
</dependencies>
</project>
5、构建Eureka服务注册中心
新建xz_bigdata_springcloud_eureka子模块
引入依赖
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>xz_bigdata_springcloud_root</artifactId>
<groupId>com.hsiehchou.springcloud</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>xz_bigdata_springcloud_eureka</artifactId>
<name>xz_bigdata_springcloud_eureka</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>com.hsiehchou.springcloud</groupId>
<artifactId>xz_bigdata_springcloud_common</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<!--用户验证-->
<!-- <dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-security</artifactId>
<version>1.4.1.RELEASE</version>
</dependency>-->
</dependencies>
<build>
<plugins>
<plugin><!--打包依赖的jar包-->
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<configuration>
<outputDirectory>${project.build.directory}/lib</outputDirectory>
<excludeTransitive>false</excludeTransitive> <!-- 表示是否不包含间接依赖的包 -->
<stripVersion>false</stripVersion> <!-- 去除版本信息 -->
</configuration>
<executions>
<execution>
<id>copy-dependencies</id>
<phase>package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<!-- 拷贝项目依赖包到lib/目录下 -->
<outputDirectory>${project.build.directory}/jars</outputDirectory>
<excludeTransitive>false</excludeTransitive>
<stripVersion>false</stripVersion>
</configuration>
</execution>
</executions>
</plugin>
<!-- 打成jar包插件 -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.5</version>
<configuration>
<archive>
<!--
生成的jar中,不要包含pom.xml和pom.properties这两个文件
-->
<addMavenDescriptor>false</addMavenDescriptor>
<!-- 生成MANIFEST.MF的设置 -->
<manifest>
<!-- 为依赖包添加路径, 这些路径会写在MANIFEST文件的Class-Path下 -->
<addClasspath>true</addClasspath>
<classpathPrefix>jars/</classpathPrefix>
<!-- jar启动入口类-->
<mainClass>com.cn.hbase.mr.HbaseMr</mainClass>
</manifest>
<!-- <manifestEntries>
<!– 在Class-Path下添加配置文件的路径 –>
<Class-Path></Class-Path>
</manifestEntries>-->
</archive>
<outputDirectory>${project.build.directory}/</outputDirectory>
<includes>
<!-- 打jar包时,只打包class文件 -->
<include>**/*.class</include>
<include>**/*.properties</include>
<include>**/*.yml</include>
</includes>
</configuration>
</plugin>
</plugins>
</build>
</project>
新建resources配置文件目录,添加application.yml配置文件或者 application.properties
application.yml
server:
port: 8761
eureka:
client:
register-with-eureka: false
fetch-registry: false
service-url:
defaultZone: http://root:root@hadoop3:8761/eureka/
新建EurekaApplication 启动类
EurekaApplication.java
package com.hsiehchou.springcloud.eureka;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.netflix.eureka.server.EnableEurekaServer;
/**
* 注册中心
*/
@SpringBootApplication
@EnableEurekaServer
public class EurekaApplication
{
public static void main( String[] args )
{
SpringApplication.run(EurekaApplication.class, args);
}
}
执行EurekaApplication 启动
访问localhost:8761
6、构建HBase查询服务模块
新建xz_bigdata_springcloud_root子模块
添加依赖
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>xz_bigdata_springcloud_root</artifactId>
<groupId>com.hsiehchou.springcloud</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>xz_bigdata_springcloud_hbasequery</artifactId>
<name>xz_bigdata_springcloud_hbasequery</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
<!--spring common依赖-->
<dependency>
<groupId>com.hsiehchou.springcloud</groupId>
<artifactId>xz_bigdata_springcloud_common</artifactId>
<version>1.0-SNAPSHOT</version>
<exclusions>
<exclusion>
<artifactId>HdrHistogram</artifactId>
<groupId>org.hdrhistogram</groupId>
</exclusion>
</exclusions>
</dependency>
<!--基础服务hbase依赖-->
<dependency>
<groupId>com.hsiehchou</groupId>
<artifactId>xz_bigdata_hbase</artifactId>
<version>1.0-SNAPSHOT</version>
<exclusions>
<exclusion>
<artifactId>fastjson</artifactId>
<groupId>com.alibaba</groupId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
<build>
<plugins>
<plugin><!--打包依赖的jar包-->
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<configuration>
<outputDirectory>${project.build.directory}/lib</outputDirectory>
<excludeTransitive>false</excludeTransitive> <!-- 表示是否不包含间接依赖的包 -->
<stripVersion>false</stripVersion> <!-- 去除版本信息 -->
</configuration>
<executions>
<execution>
<id>copy-dependencies</id>
<phase>package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<!-- 拷贝项目依赖包到lib/目录下 -->
<outputDirectory>${project.build.directory}/jars</outputDirectory>
<excludeTransitive>false</excludeTransitive>
<stripVersion>false</stripVersion>
</configuration>
</execution>
</executions>
</plugin>
<!-- 打成jar包插件 -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.5</version>
<configuration>
<archive>
<!--
生成的jar中,不要包含pom.xml和pom.properties这两个文件
-->
<addMavenDescriptor>false</addMavenDescriptor>
<!-- 生成MANIFEST.MF的设置 -->
<manifest>
<!-- 为依赖包添加路径, 这些路径会写在MANIFEST文件的Class-Path下 -->
<addClasspath>true</addClasspath>
<classpathPrefix>jars/</classpathPrefix>
<!-- jar启动入口类-->
<mainClass>com.cn.hbase.mr.HbaseMr</mainClass>
</manifest>
<!-- <manifestEntries>
<!– 在Class-Path下添加配置文件的路径 –>
<Class-Path></Class-Path>
</manifestEntries>-->
</archive>
<outputDirectory>${project.build.directory}/</outputDirectory>
<includes>
<!-- 打jar包时,只打包class文件 -->
<include>**/*.class</include>
<include>**/*.properties</include>
<include>**/*.yml</include>
</includes>
</configuration>
</plugin>
</plugins>
</build>
</project>
添加配置文件
新建 resources 目录
添加 application.properties 文件
server.port=8002
logging.level.root=INFO
logging.level.org.hibernate=INFO
logging.level.org.hibernate.type.descriptor.sql.BasicBinder=TRACE
logging.level.org.hibernate.type.descriptor.sql.BasicExtractor= TRACE
logging.level.com.itmuch=DEBUG
spring.http.encoding.charset=UTF-8
spring.http.encoding.enable=true
spring.http.encoding.force=true
eureka.client.serviceUrl.defaultZone=http://root:root@hadoop3:8761/eureka/
spring.application.name=xz-bigdata-springcloud-hbasequery
eureka.instance.prefer-ip-address=true
构建启动类
新建 com.hsiehchou.springcloud.hbase包
构建 HbaseApplication 启动类
package com.hsiehchou.springcloud;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.netflix.eureka.server.EnableEurekaServer;
@SpringBootApplication
@EnableEurekaServer
public class HbaseQueryApplication
{
public static void main( String[] args )
{
SpringApplication.run(HbaseQueryApplication.class, args);
}
}
说明注册成功
构建服务
构建 com.hsiehchou.springcloud.hbase.controller
创建 HbaseBaseController
HbaseBaseController.java
package com.hsiehchou.springcloud.hbase.controller;
import com.hsiehchou.hbase.extractor.SingleColumnMultiVersionRowExtrator;
import com.hsiehchou.hbase.search.HBaseSearchService;
import com.hsiehchou.hbase.search.HBaseSearchServiceImpl;
import com.hsiehchou.springcloud.hbase.service.HbaseBaseService;
import org.apache.hadoop.hbase.client.Get;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.*;
import javax.annotation.Resource;
import java.io.IOException;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
@Controller
@RequestMapping(value="/hbase")
public class HbaseBaseController {
private static Logger LOG = LoggerFactory.getLogger(HbaseBaseController.class);
//注入 通过这个注解可以直接拿到HbaseBaseService这个的实例
@Resource
private HbaseBaseService hbaseBaseService;
@ResponseBody
@RequestMapping(value="/search/{table}/{rowkey}", method={RequestMethod.GET,RequestMethod.POST})
public Set<String> search(@PathVariable(value = "table") String table,
@PathVariable(value = "rowkey") String rowkey){
return hbaseBaseService.getSingleColumn(table,rowkey);
}
@ResponseBody
@RequestMapping(value="/search1", method={RequestMethod.GET,RequestMethod.POST})
public Set<String> search1( @RequestParam(name = "table") String table,
@RequestParam(name = "rowkey") String rowkey){
//通过二级索引去找主关联表的rowkey 这个rowkey就是MAC
return hbaseBaseService.getSingleColumn(table,rowkey);
}
@ResponseBody
@RequestMapping(value = "/getHbase",method = {RequestMethod.GET,RequestMethod.POST})
public Set<String> getHbase(@RequestParam(name="table") String table,
@RequestParam(name="rowkey") String rowkey){
return hbaseBaseService.getSingleColumn(table, rowkey);
}
@ResponseBody
@RequestMapping(value = "/getRelation",method = {RequestMethod.GET,RequestMethod.POST})
public Map<String,List<String>> getRelation(@RequestParam(name = "field") String field,
@RequestParam(name = "fieldValue") String fieldValue){
return hbaseBaseService.getRealtion(field,fieldValue);
}
public static void main(String[] args) {
HbaseBaseController hbaseBaseController = new HbaseBaseController();
hbaseBaseController.getHbase("send_mail", "65497873@qq.com");
}
}
构建 com.hsiehchou.springcloud.hbase.service
创建 HbaseBaseService
HbaseBaseService.java
package com.hsiehchou.springcloud.hbase.service;
import com.hsiehchou.hbase.entity.HBaseCell;
import com.hsiehchou.hbase.entity.HBaseRow;
import com.hsiehchou.hbase.extractor.MultiVersionRowExtrator;
import com.hsiehchou.hbase.extractor.SingleColumnMultiVersionRowExtrator;
import com.hsiehchou.hbase.search.HBaseSearchService;
import com.hsiehchou.hbase.search.HBaseSearchServiceImpl;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Put;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Service;
import javax.annotation.Resource;
import java.io.IOException;
import java.util.*;
@Service
public class HbaseBaseService {
private static Logger LOG = LoggerFactory.getLogger(HbaseBaseService.class);
@Resource
private HbaseBaseService hbaseBaseService;
/**
* 获取hbase单列数据的多版本信息
* @param field
* @param rowkey
* @return
*/
public Set<String> getSingleColumn(String field,String rowkey){
//从索引表中获取总关联表的rowkey,获取phone对应的多版本MAC
Set<String> search = null;
HBaseSearchService hBaseSearchService = new HBaseSearchServiceImpl();
String table = "test:"+field;
Get get = new Get(rowkey.getBytes());
try {
get.setMaxVersions(100);
} catch (IOException e) {
e.printStackTrace();
}
Set set = new HashSet<String>();
SingleColumnMultiVersionRowExtrator singleColumnMultiVersionRowExtrator = new SingleColumnMultiVersionRowExtrator("cf".getBytes(), "phone_mac".getBytes(), set);
try {
search = hBaseSearchService.search(table, get, singleColumnMultiVersionRowExtrator);
System.out.println(search.toString());
} catch (IOException e) {
e.printStackTrace();
}
return search;
}
/**
* 获取单列多版本
* @param table
* @param rowkey
* @param versions
* @return
*/
public Set<String> getSingleColumn(String table,String rowkey,int versions){
Set<String> search = null;
try {
HBaseSearchService baseSearchService = new HBaseSearchServiceImpl();
Get get = new Get(rowkey.getBytes());
get.setMaxVersions(versions);
Set set = new HashSet<String>();
SingleColumnMultiVersionRowExtrator singleColumnMultiVersionRowExtrator = new SingleColumnMultiVersionRowExtrator("cf".getBytes(), "phone_mac".getBytes(), set);
search = baseSearchService.search(table, get, singleColumnMultiVersionRowExtrator);
} catch (IOException e) {
LOG.error(null,e);
}
System.out.println(search);
return search;
}
/**
* 直接通过关联表字段值获取整条记录
* hbase 二级查找
* @param field
* @param fieldValue
* @return
*/
public Map<String,List<String>> getRealtion(String field,String fieldValue){
//第一步 从二级索引表中找到多版本的rowkey
Map<String,List<String>> map = new HashMap<>();
//首先查找索引表
//查找的表名
String table = "test:" + field;
String indexRowkey = fieldValue;
HbaseBaseService hbaseBaseService = new HbaseBaseService();
Set<String> relationRowkeys = hbaseBaseService.getSingleColumn(table, indexRowkey, 100);
//第二步 拿到二级索引表中得到的 主关联表的rowkey
//对这些rowkey进行遍历 获取主关联表中rowkey对应的所有多版本数据
//遍历relationRowkeys,将其封装成List<Get>
List<Get> list = new ArrayList<>();
relationRowkeys.forEach(relationRowkey->{
//通过relationRowkey去找relation表中的所有信息
Get get = new Get(relationRowkey.getBytes());
try {
get.setMaxVersions(100);
} catch (IOException e) {
e.printStackTrace();
}
list.add(get);
});
MultiVersionRowExtrator multiVersionRowExtrator = new MultiVersionRowExtrator();
HBaseSearchService hBaseSearchService = new HBaseSearchServiceImpl();
try {
//<T> List<T> search(String tableName, List<Get> gets, RowExtractor<T> extractor) throws IOException;
List<HBaseRow> search = hBaseSearchService.search("test:relation", list, multiVersionRowExtrator);
search.forEach(hbaseRow->{
Map<String, Collection<HBaseCell>> cellMap = hbaseRow.getCell();
cellMap.forEach((key,value)->{
//把Map<String,Collection<HBaseCell>>转为Map<String,List<String>>
List<String> listValue = new ArrayList<>();
value.forEach(x->{
listValue.add(x.toString());
});
map.put(key,listValue);
});
});
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(map.toString());
return map;
}
public static void main(String[] args) {
HbaseBaseService hbaseBaseService = new HbaseBaseService();
// hbaseBaseService.getRealtion("send_mail","65494533@qq.com");
hbaseBaseService.getSingleColumn("phone","18609765012");
}
}
7、构建ES查询服务
使用jest API 是走的 HTTP 请求 9200端口
依赖如下:
<dependency>
<groupId>io.searchbox</groupId>
<artifactId>jest</artifactId>
<version>6.3.1</version>
</dependency>
9200作为Http协议,主要用于外部通讯
9300作为Tcp协议,jar之间就是通过 tcp协议通讯
ES集群之间是通过9300进行通讯
新建xz_bigdata_springcloud_esquery
新建xz_bigdata_springcloud_esquery子项目
准备
新建 resources 配置文件目录
增加配置文件
application.properties
server.port=8003
logging.level.root=INFO
logging.level.org.hibernate=INFO
logging.level.org.hibernate.type.descriptor.sql.BasicBinder=TRACE
logging.level.org.hibernate.type.descriptor.sql.BasicExtractor= TRACE
logging.level.com.itmuch=DEBUG
spring.http.encoding.charset=UTF-8
spring.http.encoding.enable=true
spring.http.encoding.force=true
eureka.client.serviceUrl.defaultZone=http://root:root@hadoop3:8761/eureka/
spring.application.name=xz-bigdata-springcloud-esquery
eureka.instance.prefer-ip-address=true
#关闭EDES检测
management.health.elasticsearch.enabled=false
spring.elasticsearch.jest.uris=["http://192.168.116.201:9200"]
#全部索引
esIndexs=wechat,mail,qq
新建ES微服务启动类
ESqueryApplication.java
package com.hsiehchou.springcloud.es;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.client.discovery.EnableDiscoveryClient;
import org.springframework.cloud.netflix.eureka.server.EnableEurekaServer;
import org.springframework.cloud.openfeign.EnableFeignClients;
@SpringBootApplication
@EnableEurekaServer
@EnableDiscoveryClient
@EnableFeignClients
public class ESqueryApplication {
public static void main(String[] args) {
SpringApplication.run(ESqueryApplication.class,args);
}
}
启动 Eureka ES 微服务
说明注册成功
构建 com.hsiehchou.springcloud.es.controller
创建 EsBaseController
package com.hsiehchou.springcloud.es.controller;
import com.hsiehchou.springcloud.es.feign.HbaseFeign;
import com.hsiehchou.springcloud.es.service.EsBaseService;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.ResponseBody;
import javax.annotation.Resource;
import java.util.List;
import java.util.Map;
import java.util.Set;
@Controller
@RequestMapping(value = "/es")
public class EsBaseController {
@Value("${esIndexs}")
private String esIndexs;
@Resource
private EsBaseService esBaseService;
@Resource
private HbaseFeign hbaseFeign;
/**
* 基础查询
* @param indexName
* @param typeName
* @param sortField
* @param sortValue
* @param pageNumber
* @param pageSize
* @return
*/
@ResponseBody
@RequestMapping(value = "/getBaseInfo", method = {RequestMethod.GET, RequestMethod.POST})
public List<Map<String, Object>> getBaseInfo(@RequestParam(name = "indexName") String indexName,
@RequestParam(name = "typeName") String typeName,
@RequestParam(name = "sortField") String sortField,
@RequestParam(name = "sortValue") String sortValue,
@RequestParam(name = "pageNumber") int pageNumber,
@RequestParam(name = "pageSize") int pageSize) {
// 根据数据类型, 排序,分页
// indexName typeName
// sortField sortValue
// pageNumber pageSize
return esBaseService.getBaseInfo(indexName,typeName,sortField,sortValue,pageNumber,pageSize);
}
/**
* 根据任意条件查找轨迹数据
* @param field
* @param fieldValue
* @return
*/
@ResponseBody
@RequestMapping(value = "/getLocus", method = {RequestMethod.GET, RequestMethod.POST})
public List<Map<String, Object>> getLocus(@RequestParam(name = "field") String field,
@RequestParam(name = "fieldValue") String fieldValue) {
Set<String> macs = hbaseFeign.search1(field, fieldValue);
System.out.println(macs.toString());
// 根据数据类型, 排序,分页
// indexName typeName
// sortField sortValue
// pageNumber pageSize
String mac = macs.iterator().next();
return esBaseService.getLocus(mac);
}
/**
* 所有表数据总量
* @return
*/
@ResponseBody
@RequestMapping(value="/getAllCount", method={RequestMethod.GET,RequestMethod.POST})
public Map<String,Long> getAllCount(){
Map<String, Long> allCount = esBaseService.getAllCount(esIndexs);
System.out.println(allCount);
return allCount;
}
@ResponseBody
@RequestMapping(value="/group", method={RequestMethod.GET,RequestMethod.POST})
public Map<String,Long> group(@RequestParam(name = "indexName") String indexName,
@RequestParam(name = "typeName") String typeName,
@RequestParam(name = "field") String field){
return esBaseService.aggregation(indexName,typeName,field);
}
public static void main(String[] args){
EsBaseController esBaseController = new EsBaseController();
esBaseController.getLocus("phone","18609765432");
}
}
构建 com.hsiehchou.springcloud.es.service
创建 EsBaseService
package com.hsiehchou.springcloud.es.service;
import com.hsiehchou.es.jest.service.JestService;
import com.hsiehchou.es.jest.service.ResultParse;
import io.searchbox.client.JestClient;
import io.searchbox.core.SearchResult;
import org.springframework.stereotype.Service;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
@Service
public class EsBaseService {
// 根据数据类型, 排序,分页
// indexName typeName
// sortField sortValue
// pageNumber pageSize
public List<Map<String, Object>> getBaseInfo(String indexName,
String typeName,
String sortField,
String sortValue,
int pageNumber,
int pageSize) {
//实现查询
JestClient jestClient = null;
List<Map<String, Object>> maps = null;
try {
jestClient = JestService.getJestClient();
SearchResult search = JestService.search(jestClient,
indexName,
typeName,
"",
"",
sortField,
sortValue,
pageNumber,
pageSize);
maps = ResultParse.parseSearchResultOnly(search);
} catch (Exception e) {
e.printStackTrace();
} finally {
JestService.closeJestClient(jestClient);
}
return maps;
}
// 传时间范围 比如你要查3天之内的轨迹
// es中text的类型的可以直接查询,而keyword类型的必须带.keyword,例如,phone_mac.keyword
public List<Map<String, Object>> getLocus(String mac){
//实现查询
JestClient jestClient = null;
List<Map<String, Object>> maps = null;
String[] includes = new String[]{"latitude","longitude","collect_time"};
try {
jestClient = JestService.getJestClient();
SearchResult search = JestService.search(jestClient,
"",
"",
"phone_mac.keyword",
mac,
"collect_time",
"asc",
1,
2000,
includes);
maps = ResultParse.parseSearchResultOnly(search);
} catch (Exception e) {
e.printStackTrace();
} finally {
JestService.closeJestClient(jestClient);
}
return maps;
}
public Map<String,Long> getAllCount(String esIndexs){
Map<String,Long> countMap = new HashMap<>();
JestClient jestClient = null;
try {
jestClient = JestService.getJestClient();
String[] split = esIndexs.split(",");
for (int i = 0; i < split.length; i++) {
String index = split[i];
Long count = JestService.count(jestClient, index, index);
countMap.put(index,count);
}
} catch (Exception e) {
e.printStackTrace();
}finally {
JestService.closeJestClient(jestClient);
}
return countMap;
}
public Map<String,Long> aggregation(String indexName,String typeName,String field){
JestClient jestClient = null;
Map<String, Long> stringLongMap = null;
try {
jestClient = JestService.getJestClient();
SearchResult aggregation = JestService.aggregation(jestClient, indexName, typeName, field);
stringLongMap = ResultParse.parseAggregation(aggregation);
} catch (Exception e) {
e.printStackTrace();
}finally {
JestService.closeJestClient(jestClient);
}
return stringLongMap;
}
}
这里用到了ES的大数据基础服务
轨迹查询
用到了 HBase 的服务,使用 Fegin
SpringCloud Feign
Feign 是一个声明式的伪Http客户端,它使得写Http客户端变得更简单。使用 Feign ,只需要创建一个接口并用注解的方式来配置它,即可完成对服务提供方的接口绑定服务调用客户端的开发量。
构建 com.hsiehchou.springcloud.es.fegin
创建 HbaseFeign
package com.hsiehchou.springcloud.es.feign;
import org.springframework.cloud.openfeign.FeignClient;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.ResponseBody;
import java.util.Set;
@FeignClient(name = "xz-bigdata-springcloud-hbasequery")
public interface HbaseFeign {
@ResponseBody
@RequestMapping(value="/hbase/search1", method=RequestMethod.GET)
public Set<String> search1(@RequestParam(name = "table") String table,
@RequestParam(name = "rowkey") String rowkey);
}
8、微服务手动部署
Maven添加打包插件
<build>
<plugins>
<plugin><!--打包依赖的jar包-->
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<configuration>
<outputDirectory>${project.build.directory}/lib</outputDirectory>
<excludeTransitive>false</excludeTransitive> <!-- 表示是否不包含间接依赖的包 -->
<stripVersion>false</stripVersion> <!-- 去除版本信息 -->
</configuration>
<executions>
<execution>
<id>copy-dependencies</id>
<phase>package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<!-- 拷贝项目依赖包到lib/目录下 -->
<outputDirectory>${project.build.directory}/jars</outputDirectory>
<excludeTransitive>false</excludeTransitive>
<stripVersion>false</stripVersion>
</configuration>
</execution>
</executions>
</plugin>
<!-- 打成jar包插件 -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.4</version>
<configuration>
<archive>
<!--
生成的jar中,不要包含pom.xml和pom.properties这两个文件
-->
<addMavenDescriptor>false</addMavenDescriptor>
<!-- 生成MANIFEST.MF的设置 -->
<manifest>
<!-- 为依赖包添加路径, 这些路径会写在MANIFEST文件的Class-Path下 -->
<addClasspath>true</addClasspath>
<classpathPrefix>jars/</classpathPrefix>
<!-- jar启动入口类-->
<mainClass>com.cn.hbase.mr.HbaseMr</mainClass>
</manifest>
<!-- <manifestEntries>
<!– 在Class-Path下添加配置文件的路径 –>
<Class-Path></Class-Path>
</manifestEntries>-->
</archive>
<outputDirectory>${project.build.directory}/</outputDirectory>
<includes>
<!-- 打jar包时,只打包class文件 -->
<include>**/*.class</include>
<include>**/*.properties</include>
<include>**/*.yml</include>
</includes>
</configuration>
</plugin>
</plugins>
</build>
因为微服务依赖 xz_bigdata2 所以先打包xz_bigdata2
修改配置文件
defaultZone: http://root:root@hadoop3:8761/eureka/
将注册中心 IP 改为部署服务器的IP
微服务同理
上面给出的配置文件已经修改好了
部署
- 先部署Erueka服务中心
新建/usr/chl/springcloud/eureka
上传jars 和jar
- 启动服务中心
eureka服务注册中心启动
nohup java -cp /usr/chl/springcloud/eureka/xz_bigdata_springcloud_eureka-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.eureka.EurekaApplication &
查看日志
tail -f nohup.out
- 部署esquery
esquery微服务启动
nohup java -cp /usr/chl/springcloud/esquery/xz_bigdata_springcloud_esquery-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.es.ESqueryApplication &
- 部署hbasequery
hbasequery微服务启动
nohup java -cp /usr/chl/springcloud/hbasequery/xz_bigdata_springcloud_hbasequery-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.HbaseQueryApplication &
9、执行
hadoop3:8002/hbase/getRelation?field=phone&fieldValue=18609765012
hadoop3:8002/hbase/search1?table=phone&rowkey=18609765012
hadoop3:8002/hbase/getHbase?table=send_mail&rowkey=65497873@qq.com
hadoop3:8002/hbase/getHbase?table=phone&rowkey=18609765012
hadoop3:8002/hbase/search/phone/18609765012
hadoop3:8003/es/getAllCount
hadoop3:8003/es/getBaseInfo
hadoop3:8003/es/getLocus
hadoop3:8003/es/group
十三、附录
1、测试数据
mail_source1_1111101.txt
000000000000011 000000000000011 23.000011 24.000011 1c-41-cd-b1-df-3f 1b-3d-zg-fg-ef-1b 32109246 1557300088 65497873@qq.com 1789090763 11111111@qq.com 1789097863 今天出去打球吗 send
000000000000011 000000000000011 24.000011 25.000011 1c-41-cd-b1-df-3f 1b-3d-zg-fg-ef-1b 32109246 1557300085 65497873@qq.com 1789090764 22222222@qq.com 1789097864 今天出去打球吗 send
000000000000011 000000000000011 23.000011 24.000011 1c-41-cd-b1-df-3f 1b-3d-zg-fg-ef-1b 32109246 1557300088 65497873@qq.com 1789090763 33333333@qq.com 1789097863 今天出去打球吗 send
000000000000011 000000000000011 24.000011 25.000011 1c-41-cd-b1-df-3f 1b-3d-zg-fg-ef-1b 32109246 1557300085 65497873@qq.com 1789090764 44444444@qq.com 1789097864 今天出去打球吗 send
000000000000000 000000000000000 23.000001 24.000001 aa-aa-aa-aa-aa-aa bb-bb-bb-bb-bb-bb 32109231 1557305988 1323243@qq.com 1789098763 43432543@qq.com 1789098863 今天出去打球吗 send
000000000000000 000000000000000 24.000001 25.000001 aa-aa-aa-aa-aa-aa bb-bb-bb-bb-bb-bb 32109231 1557305985 1323243@qq.com 1789098764 43432543@qq.com 1789098864 今天出去打球吗 send
000000000000000 000000000000000 23.000001 24.000001 aa-aa-aa-aa-aa-aa bb-bb-bb-bb-bb-bb 32109231 1557305988 1323243@qq.com 1789098763 43432543@qq.com 1789098863 今天出去打球吗 send
000000000000000 000000000000000 24.000001 25.000001 aa-aa-aa-aa-aa-aa bb-bb-bb-bb-bb-bb 32109231 1557305985 1323243@qq.com 1789098764 43432543@qq.com 1789098864 今天出去打球吗 send
qq_source1_1111101.txt
000000000000000 000000000000000 23.000000 24.000000 aa-aa-aa-aa-aa-aa bb-bb-bb-bb-bb-bb 32109231 1557305988 andiy 18609765432 judy 1789098762
000000000000000 000000000000000 24.000000 25.000000 aa-aa-aa-aa-aa-aa bb-bb-bb-bb-bb-bb 32109231 1557305985 andiy 18609765432 judy 1789098763
000000000000000 000000000000000 23.000000 24.000000 aa-aa-aa-aa-aa-aa bb-bb-bb-bb-bb-bb 32109231 1557305988 andiy 18609765432 judy 1789098762
000000000000000 000000000000000 24.000000 25.000000 aa-aa-aa-aa-aa-aa bb-bb-bb-bb-bb-bb 32109231 1557305985 andiy 18609765432 judy 1789098763
000000000000011 000000000000011 23.000011 24.000011 1c-41-cd-b1-df-3f 1b-3d-zg-fg-ef-1b 32109246 1557300388 xz 18609765012 ls 1789000653
000000000000011 000000000000011 24.000011 25.000011 1c-41-cd-b1-df-3f 1b-3d-zg-fg-ef-1b 32109246 1557300545 xz 18609765012 ls 1789000343
000000000000011 000000000000011 23.000011 24.000011 1c-41-cd-b1-df-3f 1b-3d-zg-fg-ef-1b 32109246 1557300658 xz 18609765012 ls 1789000542
000000000000011 000000000000011 24.000011 25.000011 1c-41-cd-b1-df-3f 1b-3d-zg-fg-ef-1b 32109246 1557300835 xz 18609765012 ls 1789000263
000000000000011 000000000000011 23.000021 24.000031 1c-31-5d-b1-6f-3f 3y-5g-g6-du-bv-2f 32109246 1557300388 xz 18609765016 ls 1789001653
000000000000011 000000000000011 24.000021 25.000031 1c-31-5d-b1-6f-3f 3y-5g-g6-du-bv-2f 32109246 1557302235 xz 18609765016 ls 1789001343
000000000000011 000000000000011 23.000021 24.000031 1c-31-5d-b1-6f-3f 3y-5g-g6-du-bv-2f 32109246 1557303658 xz 18609765016 ls 1789001542
000000000000011 000000000000011 24.000021 25.000031 1c-31-5d-b1-6f-3f 3y-5g-g6-du-bv-2f 32109246 1557303835 xz 18609765016 ls 1789001263
000000000000011 000000000000011 23.000031 24.000041 4c-6f-c7-3d-a4-3d 9g-gd-3h-3k-ld-3f 32109246 1557300001 xz 18609765014 ls 1789050653
000000000000011 000000000000011 24.000031 25.000051 7c-8e-d4-a6-3d-5c 54-hg-gi-yx-ef-ge 32109246 1557300005 xz 18609765015 ls 1789070343
000000000000011 000000000000011 23.000031 24.000061 8c-g1-ed-7b-5f-1b 47-fy-vv-hs-ue-fd 32109246 1557300008 xz 18609765017 ls 1789080542
000000000000011 000000000000011 24.000031 25.000071 0c-76-2a-b1-3c-1a f5-nw-hf-ud-ht-ea 32109246 1557300115 xz 18609765010 ls 1789082263
wechat_source1_1111101.txt
000000000000000 000000000000000 23.000000 24.000000 aa-aa-aa-aa-aa-aa bb-bb-bb-bb-bb-bb 32109231 1557305988 andiy 18609765432 judy 1789098762
000000000000000 000000000000000 24.000000 25.000000 aa-aa-aa-aa-aa-aa bb-bb-bb-bb-bb-bb 32109231 1557305985 andiy 18609765432 judy 1789098763
000000000000000 000000000000000 23.000000 24.000000 aa-aa-aa-aa-aa-aa bb-bb-bb-bb-bb-bb 32109231 1557305988 andiy 18609765432 judy 1789098762
000000000000000 000000000000000 24.000000 25.000000 aa-aa-aa-aa-aa-aa bb-bb-bb-bb-bb-bb 32109231 1557305985 andiy 18609765432 judy 1789098763
000000000000011 000000000000011 23.000011 24.000011 1c-41-cd-b1-df-3f 1b-3d-zg-fg-ef-1b 32109246 1557300388 xz 18609765012 ls 1789000653
000000000000011 000000000000011 24.000011 25.000011 1c-41-cd-b1-df-3f 1b-3d-zg-fg-ef-1b 32109246 1557300545 xz 18609765012 ls 1789000343
000000000000011 000000000000011 23.000011 24.000011 1c-41-cd-b1-df-3f 1b-3d-zg-fg-ef-1b 32109246 1557300658 xz 18609765012 ls 1789000542
000000000000011 000000000000011 24.000011 25.000011 1c-41-cd-b1-df-3f 1b-3d-zg-fg-ef-1b 32109246 1557300835 xz 18609765012 ls 1789000263
000000000000011 000000000000011 23.000021 24.000031 1c-31-5d-b1-6f-3f 3y-5g-g6-du-bv-2f 32109246 1557300388 xz 18609765016 ls 1789001653
000000000000011 000000000000011 24.000021 25.000031 1c-31-5d-b1-6f-3f 3y-5g-g6-du-bv-2f 32109246 1557302235 xz 18609765016 ls 1789001343
000000000000011 000000000000011 23.000021 24.000031 1c-31-5d-b1-6f-3f 3y-5g-g6-du-bv-2f 32109246 1557303658 xz 18609765016 ls 1789001542
000000000000011 000000000000011 24.000021 25.000031 1c-31-5d-b1-6f-3f 3y-5g-g6-du-bv-2f 32109246 1557303835 xz 18609765016 ls 1789001263
000000000000011 000000000000011 23.000031 24.000041 4c-6f-c7-3d-a4-3d 9g-gd-3h-3k-ld-3f 32109246 1557300001 xz 18609765014 ls 1789050653
000000000000011 000000000000011 24.000031 25.000051 7c-8e-d4-a6-3d-5c 54-hg-gi-yx-ef-ge 32109246 1557300005 xz 18609765015 ls 1789070343
000000000000011 000000000000011 23.000031 24.000061 8c-g1-ed-7b-5f-1b 47-fy-vv-hs-ue-fd 32109246 1557300008 xz 18609765017 ls 1789080542
000000000000011 000000000000011 24.000031 25.000071 0c-76-2a-b1-3c-1a f5-nw-hf-ud-ht-ea 32109246 1557300115 xz 18609765010 ls 1789082263
2、Kafka
创建topic,1个副本3个分区
kafka-topics –zookeeper hadoop1:2181 –topic chl_test7 –create –replication-factor 1 –partitions 3
删除topic
kafka-topics –zookeeper hadoop1:2181 –delete –topic chl_test7
列出所有的topic
kafka-topics –zookeeper hadoop1:2181 –list
消费
kafka-console-consumer –bootstrap-server hadoop1:9092 –topic chl_test7 –from-beginning
3、kafka2es
启动sparkstreaming任务
spark-submit --master yarn-cluster --num-executors 1 --driver-memory 500m --executor-memory 1g --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ' ' ',') --class com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar chl_test7 chl_test7
spark-submit
--master yarn-cluster //集群启动
--num-executors 1 //分配多少个进程
--driver-memory 500m //driver内存
--executor-memory 1g //进程内存
--executor-cores 1 //开多少个核,线程
--jars $(echo /usr/chl/spark8/jars/*.jar | tr ' ' ',') //加载jar
--class com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming //执行类 /usr/chl/spark8/xz_bigdata_spark-1.0-SNAPSHOT.jar //包的位置
4、Yarn
将yarn的执行日志输出
yarn logs -applicationId application_1561627166793_0002 > log.log
查看日志
more log.log
cat log.log
5、CDH的7180打不开
查看cloudera-scm-server状态
service cloudera-scm-server status
查看cloudera-scm-server 日志
cat /var/log/cloudera-scm-server/cloudera-scm-server.log
重启cloudera-scm-server
service cloudera-scm-server restart
6、CDH的jdk设置—重要
/usr/local/jdk1.8
7、预警
spark-submit --master local[1] --num-executors 1 --driver-memory 300m --executor-memory 500m --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ' ' ',') --class com.hsiehchou.spark.streaming.kafka.warn.WarningStreamingTask /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar
8、Kibana的DEV Tools
GET _search
{
"query": {
"match_all": {}
}
}
GET _cat/indices
DELETE tanslator_test1111
DELETE qq
DELETE wechat
DELETE mail
GET wechat
GET mail
GET _search
GET mail/_search
GET mail/_mapping
PUT mail
PUT mail/mail/_mapping
{
"_source": {
"enabled": true
},
"properties": {
"imei":{"type": "keyword"},
"imsi":{"type": "keyword"},
"longitude":{"type": "double"},
"latitude":{"type": "double"},
"phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"collect_time":{"type": "long"},
"send_mail":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"send_time":{"type": "long"},
"accept_mail":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"accept_time":{"type": "long"},
"mail_content":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"mail_type":{"type": "keyword"},
"id":{"type": "keyword"},
"table":{"type": "keyword"},
"filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
}
}
GET qq/_search
GET qq/_mapping
PUT qq
PUT qq/qq/_mapping
{
"_source": {
"enabled": true
},
"properties": {
"imei":{"type": "keyword"},
"imsi":{"type": "keyword"},
"longitude":{"type": "double"},
"latitude":{"type": "double"},
"phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"collect_time":{"type": "long"},
"username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"phone":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"object_username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"send_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"accept_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"message_time":{"type": "long"},
"id":{"type": "keyword"},
"table":{"type": "keyword"},
"filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
}
}
GET wechat/_search
GET wechat/_mapping
PUT wechat
PUT wechat/wechat/_mapping
{
"_source": {
"enabled": true
},
"properties": {
"imei":{"type": "keyword"},
"imsi":{"type": "keyword"},
"longitude":{"type": "double"},
"latitude":{"type": "double"},
"phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"collect_time":{"type": "long"},
"username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"phone":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"object_username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"send_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"accept_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"message_time":{"type": "long"},
"id":{"type": "keyword"},
"table":{"type": "keyword"},
"filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
"absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
}
}
9、Hive
kafka写入hive
spark-submit --master local[1] --num-executors 1 --driver-memory 300m --executor-memory 500m --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ' ' ',') --class com.hsiehchou.spark.streaming.kafka.kafka2hdfs.Kafka2HiveTest /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar
show tables;
hdfs dfs -ls /apps/hive/warehouse/external
hdfs dfs -rm -r /apps/hive/warehouse/external/mail
drop table mail;
desc qq;
select * from qq limit 1;
注意了:cdh的hive版本跟其对应的spark版本不一致的话此处执行不了
select count(*) from qq;
合并小文件
crontab -e
0 1 * * * spark-submit --master local[1] --num-executors 1 --driver-memory 300m --executor-memory 500m --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ' ' ',') --class com.hsiehchou.spark.streaming.kafka.kafka2hdfs.CombineHdfs /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar
10、Zookeeper
启动zookeeper客户端
zookeeper-client
清除消费者
rmr /consumers/WarningStreamingTask2/offsets
rmr /consumers/Kafka2HiveTest/offsets
rmr /consumers/DataRelationStreaming1/offsets
11、Hbase
spark-submit --master local[1] --num-executors 1 --driver-memory 500m --executor-memory 1g --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ' ' ',') --class com.hsiehchou.spark.streaming.kafka.kafka2hbase.DataRelationStreaming /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar
hbase shell
list
create 't1','cf'
desc 't1'
put 't1','aa-aa-aa-aa-aa-aa','cf:qq','66666666'
put 't1','aa-aa-aa-aa-aa-aa','cf:weixin','weixin1'
put 't1','aa-aa-aa-aa-aa-aa','cf:mail','66666@qq.com'
scan 't1'
将表变成多版本
alter 't1',{NAME=>'cf',VERSIONS=>50}
put 't1','aa-aa-aa-aa-aa-aa','cf:qq','77777777'
get 't1','aa-aa-aa-aa-aa-aa',{COLUMN=>'cf',VERSIONS=>10}
put 't1','aa-aa-aa-aa-aa-aa','cf:qq','55555555'
put 't1','aa-aa-aa-aa-aa-aa','cf:qq','88888888',1290300544
执行DataRelationStreaming
scan 'test:relation'
get 'test:username','andiy'
scan 'test:relation'
mail
改mac 邮箱
get 'test:relation','',{COLUMN=>'cf',VERSIONS=>10}
disable 'test:imei'
drop 'test:imei'
disable 'test:imsi'
drop 'test:imsi'
disable 'test:phone'
drop 'test:phone'
disable 'test:phone_mac'
drop 'test:phone_mac'
disable 'test:relation'
drop 'test:relation'
disable 'test:send_mail'
drop 'test:send_mail'
disable 'test:username'
drop 'test:username'
12、SpringCloud
eureka服务注册中心启动
nohup java -cp /usr/chl/springcloud/eureka/xz_bigdata_springcloud_eureka-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.eureka.EurekaApplication &
查看日志
tail -f nohup.out
esquery微服务启动
nohup java -cp /usr/chl/springcloud/esquery/xz_bigdata_springcloud_esquery-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.es.ESqueryApplication &
hbasequery微服务启动
nohup java -cp /usr/chl/springcloud/hbasequery/xz_bigdata_springcloud_hbasequery-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.HbaseQueryApplication &