企业网络日志分析

大数据项目网络日志分析

大数据项目

发布日期: 2019-07-27

文章字数: 79.1k

阅读时长: 439 分

阅读次数:

一、背景数据介绍

1. WiFi有哪些数据？
手机号
机构
机构
机构
网页快照
论坛帖子
微博
邮件
IM聊天
表单数据
APP使用

2. WiFi价值
客户体验：方便客户、基础设施
客户数据：精准营销、获取客户上网行为、获取客户信息、客户接触渠道

3. WiFi数据获取
Wi-Fi 网络可以捕获附近智能手机的 IMSI 号码，无线跟踪并监控用户的根源在于智能手机（包括 Android 和 iOS 设备）连接 Wi-Fi 网络的方式。

在大多数现代移动操作系统中有两种广泛实现的协议：
可扩展认证协议（EAP）
认证和密钥协商（AKA）协议

这些协议允许智能手机通过自身设备的 IMSI 号码切换登录到已知的 Wi-Fi 网络，实现 WiFi 网络自动连接而无需所有者交互。

4. wifi数据应用

画像系统

5. 数据架构
网络用户行为数据架构图

6. 数据结构
（1） 文件命名
数据类型_来源_UUID.txt
如BASE_SOURCE_UUID.txt

定一套字段标准，类型标准
（2）字段
（3） 通用字段

参数1	参数2	参数3	参数4
imei	imei号，手机唯一识别码		手机IMEI码由15-17位数字组成
imsi	IMSI，SIM卡唯一识别码	460011418603055	14-15位数字
longitude	经度		精确到小数点6位
latitude	纬度		精确到小数点6位
phone_mac	手机MAC		格式需要统一（清洗）aa-aa-aa-aa-aa-aa（范围1-9，a-f）
device_mac	采集设备MAC		格式需要统一（清洗）aa-aa-aa-aa-aa-aa（范围任意数字加字母）
device_number	采集设备号
collect_time	collect_time

微信数据(wechat)

参数1	参数2	参数3	参数4
username	微信昵称
phone	手机号
object_username	对方微信号
send_message	发送内容（不能破解）
accept_message	接收内容（不能破解）
message_time	通信时间

邮箱数据(Mail)

参数1	参数2	参数4
send_mail	发送邮箱
send_time	发送时间
accept_mail	接收邮箱
accept_time	接收时间
mail_content	发送内容
mail_type	发送还是接收	send accept

搜索数据(Search)

参数1	参数2	参数3	参数4
search_content	搜索内容
search_url	搜索URL
search_type	搜索引擎
search_time	搜索时间

基础数据(Base)

参数1	参数2	参数3	参数4
name	姓名
is_marry	是否已婚
phone	手机号
address	户籍所在地
address_new	现在居住地址
birthday	出生日期
car_number	车牌号
idcard	身份证

问题：数据结构，数据字段如何确定？
根据实际的需求自己确定。

二．基础架构搭建

1、创建Maven父项

总的pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.hsiehchou</groupId>
  <artifactId>xz_bigdata2</artifactId>
  <packaging>pom</packaging>
  <version>1.0-SNAPSHOT</version>
  <modules>
    <module>xz_bigdata_common</module>
    <module>xz_bigdata_es</module>
    <module>xz_bigdata_flume</module>
    <module>xz_bigdata_hbase</module>
    <module>xz_bigdata_kafka</module>
    <module>xz_bigdata_redis</module>
    <module>xz_bigdata_resources</module>
    <module>xz_bigdata_spark</module>
  </modules>

  <name>xz_bigdata2</name>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
    <cdh.version>cdh5.14.0</cdh.version>
    <junit.version>4.12</junit.version>
    <org.slf4j.version>1.7.5</org.slf4j.version>
    <zookeeper.version>3.4.5</zookeeper.version>
    <scala.version>2.10.5</scala.version>
  </properties>

  <repositories>
    <repository>
      <id>Akka repository</id>
      <url>https://repo.akka.io/releases</url>
    </repository>
    <!--cloudera依赖-->
    <repository>
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
  </repositories>

  <!--日志依赖-->
  <dependencies>
    <dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-api</artifactId>
      <version>${org.slf4j.version}</version>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.1</version>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
          <encoding>UTF-8</encoding>
        </configuration>
      </plugin>
    </plugins>
  </build>
</project>

2、项目整体结构

xz_bigdata2整体结构

3、创建子模块

选中xz_bigdata2，右键选择Module，新建maven子模块，上面图中的那些模块都是这样创建的。
注意：开发时使用jdk1.8以上版本，里面使用了jdk1.8特有的内容，低版本开发是报错的，使用jdk1.8方便开发。

ctrl+shift+alt+s：打开Project Structure里面可以进行操作。

ctrl+alt+s：打开Settings，可以配置本地Maven（在Build,Execution,Deployment下面的Build Tools下面的Maven配置自己的本地Maven仓库路径）。

Settings里面还可以看见之前说的Plugins，安装插件，Maven Helper以及后面的Scala插件都可以这里安装。

三、Common开发

pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata2</artifactId>
        <groupId>com.hsiehchou</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_common</artifactId>

    <name>xz_bigdata_common</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <ant.version>1.9.1</ant.version>
        <jaxen.version>1.1.6</jaxen.version>
        <guava.version>12.0.1</guava.version>
        <dom4j.version>1.6.1</dom4j.version>
        <fastjson.version>1.2.5</fastjson.version>
        <disruptor.version>3.3.6</disruptor.version>
        <org.slf4j.version>1.7.5</org.slf4j.version>
        <commons.io.version>2.4</commons.io.version>
        <httpclient.version>4.2.5</httpclient.version>
        <commons.exec.version>1.3</commons.exec.version>
        <commons.lang.version>2.4</commons.lang.version>
        <commons-vfs2.version>2.1</commons-vfs2.version>
        <commons.math3.version>3.4.1</commons.math3.version>
        <commons.logging.version>1.2</commons.logging.version>
        <commons-httpclient.version>3.1</commons-httpclient.version>
        <commons.collections4.version>4.1</commons.collections4.version>
        <commons.configuration.version>1.6</commons.configuration.version>
        <mysql.connector.version>5.1.46</mysql.connector.version>
        <commons-dbutils.version>1.6</commons-dbutils.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>commons-dbutils</groupId>
            <artifactId>commons-dbutils</artifactId>
            <version>${commons-dbutils.version}</version>
        </dependency>

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.46</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_resources</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>${org.slf4j.version}</version>
        </dependency>

        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>${org.slf4j.version}</version>
        </dependency>

        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>${commons.io.version}</version>
        </dependency>

        <dependency>
            <groupId>commons-lang</groupId>
            <artifactId>commons-lang</artifactId>
            <version>${commons.lang.version}</version>
        </dependency>

        <dependency>
            <groupId>commons-configuration</groupId>
            <artifactId>commons-configuration</artifactId>
            <version>${commons.configuration.version}</version>
        </dependency>

        <dependency>
            <groupId>dom4j</groupId>
            <artifactId>dom4j</artifactId>
            <version>${dom4j.version}</version>
        </dependency>

        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>${fastjson.version}</version>
        </dependency>

        <!-- <dependency>
             <groupId>log4j</groupId>
             <artifactId>log4j</artifactId>
             <version>1.2.17</version>
         </dependency>-->
    </dependencies>

</project>

1、config/ConfigUtil.java—配置文件读取

package com.hsiehchou.common.config;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;

public class ConfigUtil {

    private static Logger LOG = LoggerFactory.getLogger(ConfigUtil.class);

    private static ConfigUtil configUtil;

    public static ConfigUtil getInstance(){

        if(configUtil == null){
            configUtil = new ConfigUtil();
        }
        return configUtil;
    }

    public Properties getProperties(String path){
        Properties properties = new Properties();
        try {
            LOG.info("开始加载配置文件" + path);
            //流式读取配置文件
            InputStream insss = this.getClass().getClassLoader().getResourceAsStream(path);
            properties = new Properties();
            properties.load(insss);
        } catch (IOException e) {
            LOG.info("加载配置文件" + path + "失败");
            LOG.error(null,e);
        }

        LOG.info("加载配置文件" + path + "成功");
        System.out.println("文件内容："+properties);
        return properties;
    }

    public static void main(String[] args) {
        ConfigUtil instance = ConfigUtil.getInstance();
        Properties properties = instance.getProperties("common/datatype.properties");
        //Properties properties = instance.getProperties("spark/relation.properties");

       // properties.get("relationfield");
        System.out.println(properties);
    }
}

2、config/JsonReader.java

package com.hsiehchou.common.config;

import org.apache.commons.io.FileUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.File;

public class JsonReader {
    private static Logger LOG = LoggerFactory.getLogger(JsonReader.class);

    public static String readJson(String json_path){
        JsonReader jsonReader = new JsonReader();
        return jsonReader.getJson(json_path);
    }

    private String getJson(String json_path){
        String jsonStr = "";
        try {
            String path = getClass().getClassLoader().getResource(json_path).toString();
            path = path.replace("\\", "/");
            if (path.contains(":")) {
                path = path.replace("file:/","");
            }
            jsonStr = FileUtils.readFileToString(new File(path), "UTF-8");
            LOG.error("读取json文件{}成功",path);
        } catch (Exception e) {
            LOG.error("读取json文件失败",e);
        }
        return jsonStr;
    }
}

3、adjuster/Adjuster.java—数据调整接口

package com.hsiehchou.common.adjuster;

/**
 * 数据调整接口
 */
public interface Adjuster<T, E> {
    E doAdjust(T data);
}

4、adjuster/StringAdjuster.java

package com.hsiehchou.common.adjuster;

public abstract class StringAdjuster<E> implements Adjuster<String, E> {}

5、file/FileCommon.java

package com.hsiehchou.common.file;

import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.net.URL;
import java.util.List;

public class FileCommon {

    private FileCommon(){}

    /**
     * 判断文件是否存在
     * @param name
     * @return
     */
    public static boolean exist(String name){
        return exist(new File(name));
    }

    public static boolean exist(File file){
        return file.exists();
    }

    /**
     * 创建文件
     * @param file
     * @return
     * @throws IOException
     */
    public static boolean createFile(String file) throws IOException {
        return createFile(new File(file));
    }

    public static boolean createFile(File file) throws IOException {
        if(!file.exists()){
            if(file.isDirectory()){
                return file.mkdirs();
            }else{
                File parentDir = file.getParentFile();
                if(!parentDir.exists()) {
                    if (parentDir.mkdirs()) {
                        return file.createNewFile();
                    }
                }else{
                    return file.createNewFile();
                }
            }
        }
        return true;
    }

    /**
     * 读取文件内容 按行
     * @param file
     * @return
     * @throws IOException
     */
    public static List<String> readLines(String file) throws IOException{
        return readLines(new File(file), "UTF-8");
    }

    public static List<String> readLines(String file, String encording) throws IOException{
        return readLines(new File(file), encording);
    }

    public static List<String> readLines(File file, String encording) throws IOException {

        List<String> lines = null;
        if(FileCommon.exist(file)) {
            FileInputStream fileInputStream = new FileInputStream(file);
            lines = IOUtils.readLines(fileInputStream, encording);
            fileInputStream.close();
        }
        return lines;
    }

    /**
     * 获取文件前缀
     * @param fileName
     * @return
     */
    public static String getPrefix(String fileName){
        String prefix = fileName;
        int pos = fileName.lastIndexOf(".");
        if (pos != -1){
            prefix = fileName.substring(0,pos);
        }
        return prefix;
    }

    /**
     * 获取文件名后缀
     * @param fileName
     * @return
     */
    public static String getFilePostfix(String fileName){
        String filePostfix = fileName.substring(fileName.lastIndexOf(".") + 1);
        return filePostfix.toLowerCase();
    }

    /**
     * 删除文件
     * @param filePath
     * @return
     */
    public static boolean delFile(String filePath) {
        boolean flag = false;
        File file = new File(filePath);
        if (file.isFile() && file.exists()) {
            flag = file.delete();
        }
        return flag;
    }

    /**
     * 移动文件
     * @param oldPath
     * @param newPath
     * @return
     */
    public static boolean mvFile(String oldPath,String newPath){
        boolean flag = false;
        File oldfile = new File(oldPath);
        File newfile = new File(newPath);
        if(oldfile.isFile() && oldfile.exists()){
            if(newfile.exists()){
                delFile(newfile.getAbsolutePath());
            }
            flag = oldfile.renameTo(newfile);
        }
        return flag;
    }

    /**
     * 删除目录
     * @param dir
     * @return
     */
    public static boolean deleteDir(File dir){
        if (dir.isDirectory()) {
            String[] children = dir.list();
            //递归删除目录中的子目录下
            if(children!=null){
                for (int i=0; i<children.length; i++) {
                    boolean success = deleteDir(new File(dir, children[i]));
                    if (!success) {
                        return false;
                    }
                }
            }
        }
        // 目录此时为空，可以删除
        return dir.delete();
    }

    //递归建立目录，解压缩相关类中使用
    public static void mkdirs(File file) {
        File parent = file.getParentFile();
        if (parent != null && (!parent.exists())) {
            parent.mkdirs();
        }
    }

    public static String getJarFilePathByClass(String clazz) throws ClassNotFoundException {
        return getJarFilePathByClass(Class.forName(clazz));
    }

    public static String getJarFileDirByClass(String clazz) throws ClassNotFoundException {
        return getJarFileDirByClass(Class.forName(clazz));
    }

    public static String getJarFilePathByClass(Class<?> clazz){
        return new File(clazz.getProtectionDomain().getCodeSource().getLocation().getFile()).getAbsolutePath();
    }

    public static String getJarFileDirByClass(Class<?> clazz){
        return new File(getJarFilePathByClass(clazz)).getParent();
    }

    public static String getAbstractPath(String abstractPath) throws Exception{
        URL url = FileCommon.class.getClassLoader().getResource(abstractPath);
        System.out.println("配置文件路径为" + url);
        File file = new File(url.getFile());
        String content= FileUtils.readFileToString(file,"UTF-8");
        return content;
    }

    public static String getAbstractPath111(String abstractPath) throws Exception{
        File file = new File(abstractPath);
        String content= FileUtils.readFileToString(file,"UTF-8");
        return content;
    }
}

6、filter—数据过滤顶层接口

package com.hsiehchou.common.filter;

/**
 * 数据过滤顶层接口
 */
public interface Filter<T> {
    boolean filter(T obj);
}

7、net/HttpRequest.java

package com.hsiehchou.common.net;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.net.URLConnection;
import java.net.URLEncoder;
import java.util.Map;

public class HttpRequest {
    private static final Logger LOG = LoggerFactory.getLogger(HttpRequest.class);

    /**
     * 向指定URL发送GET方法的请求
     * @param url  发送请求的URL
     * @param param  请求参数，请求参数应该是 name1=value1&name2=value2 的形式。
     * @return URL  所代表远程资源的响应结果
     */
    public static String sendGet(String url, String param) {
        String result = "";
        BufferedReader in = null;
        try {
            String urlNameString = url + "?" + param;
            URL realUrl = new URL(urlNameString);
            // 打开和URL之间的连接
            URLConnection connection = realUrl.openConnection();
            // 设置通用的请求属性
            connection.setRequestProperty("accept", "*/*");
            connection.setRequestProperty("connection", "Keep-Alive");
            connection.setRequestProperty("user-agent",
                    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");
            // 建立实际的连接
            connection.connect();
            // 获取所有响应头字段
            //Map<String, List<String>> map = connection.getHeaderFields();
            // 遍历所有的响应头字段
            // 定义 BufferedReader输入流来读取URL的响应
            in = new BufferedReader(new InputStreamReader(connection.getInputStream(),"UTF-8"));
            String line;
            while ((line = in.readLine()) != null) {
                result += line;
            }
        } catch (Exception e) {
            LOG.info("发送GET请求出现异常！" + (url+param));
            System.out.println("发送GET请求出现异常！" + e);
            e.printStackTrace();
        }
        // 使用finally块来关闭输入流
        finally {
            try {
                if (in != null) {
                    in.close();
                }
            } catch (Exception e2) {
                e2.printStackTrace();
            }
        }
        return result;
    }

    /**
     * 向指定URL发送GET方法的请求
     * @param url  发送请求的URL
     * @param param  请求参数，请求参数应该是 name1=value1&name2=value2 的形式。
     * @return URL 所代表远程资源的响应结果
     */
    public static String sendGet(String url, String param,String authorization) {
        String result = "";
        BufferedReader in = null;
        try {
            String urlNameString = url + "?" + param;
            URL realUrl = new URL(urlNameString);
            // 打开和URL之间的连接
            URLConnection connection = realUrl.openConnection();
            // 设置通用的请求属性
            connection.setRequestProperty("accept", "*/*");
            connection.setRequestProperty("connection", "Keep-Alive");
            connection.setRequestProperty("Authorization", authorization);
            connection.setRequestProperty("user-agent",
                    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");

            // 建立实际的连接
            connection.connect();
            // 获取所有响应头字段
            connection.getHeaderFields();
            // 遍历所有的响应头字段
/*            for (String key : map.keySet()) {
                System.out.println(key + "--->" + map.get(key));
            }*/
            // 定义 BufferedReader输入流来读取URL的响应
            in = new BufferedReader(new InputStreamReader(
                    connection.getInputStream(),"UTF-8"));
            String line;
            while ((line = in.readLine()) != null) {
                result += line;
            }
        } catch (Exception e) {
            LOG.info("发送POST请求出现异常！" + (url+param));
            System.out.println("发送POST请求出现异常！" + e);
            e.printStackTrace();
        }
        // 使用finally块来关闭输入流
        finally {
            try {
                if (in != null) {
                    in.close();
                }
            } catch (Exception e2) {
                e2.printStackTrace();
            }
        }
        return result;
    }

    public static void main(String[] args) throws Exception{

    }

    /**
     * 向指定 URL 发送POST方法的请求
     * @param url  发送请求的 URL
     * @param param  请求参数，请求参数应该是 name1=value1&name2=value2 的形式。
     * @return  所代表远程资源的响应结果
     */
    public static String sendPost(String url, String param) {
        PrintWriter out = null;
        BufferedReader in = null;
        String result = "";
        try {
            URL realUrl = new URL(url);
            // 打开和URL之间的连接
            URLConnection conn = realUrl.openConnection();
            // 设置通用的请求属性
            conn.setRequestProperty("Content-Type","application/json");
            //conn.setInstanceFollowRedirects(false);
            // conn.setRequestProperty("Content-Type","application/x-www-form-urlencoded");
            conn.setRequestProperty("accept", "*/*");
            conn.setRequestProperty("connection", "Keep-Alive");
            conn.setRequestProperty("user-agent",
                    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");
            // 发送POST请求必须设置如下两行
            conn.setReadTimeout(30000);
            conn.setDoOutput(true);
            conn.setDoInput(true);
            // 获取URLConnection对象对应的输出流
            out = new PrintWriter(conn.getOutputStream());
            // 发送请求参数
            out.print(param);
            // flush输出流的缓冲
            out.flush();
            // 定义BufferedReader输入流来读取URL的响应

            InputStream inputStream = conn.getInputStream();
            in = new BufferedReader(new InputStreamReader(inputStream,"UTF-8"));
            String line;
            while ((line = in.readLine()) != null) {
                result += line;
            }
        }
        catch (IOException e) {
            LOG.info("发送POST请求出现异常！" + (url+param),e);
        }
        //使用finally块来关闭输出流、输入流
        finally{
            try{
                if(out!=null){
                    out.close();
                }
                if(in!=null){
                    in.close();
                }
            }
            catch(IOException ex){
                ex.printStackTrace();
            }
        }
        return result;
    }

    /*
     * params 填写的URL的参数 encode 字节编码
     */
    public static String sendPostMessage(String url1,Map<String,Object> params){
        String response = null;
        Reader in = null;
        try {
            //访问准备
            URL url = new URL(url1);
            //开始访问
            StringBuilder postData = new StringBuilder();
            for (Map.Entry<String,Object> param : params.entrySet()) {
                if (postData.length() != 0) postData.append('&');
                postData.append(URLEncoder.encode(param.getKey(), "UTF-8"));
                postData.append('=');
                postData.append(URLEncoder.encode(String.valueOf(param.getValue()), "UTF-8"));
            }
            byte[] postDataBytes = postData.toString().getBytes("UTF-8");
            URLConnection conn = url.openConnection();
            //URLConnection conn = url.openConnection();
            //conn.setRequestMethod("POST");
            //conn.setInstanceFollowRedirects(false);
            //conn.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
            conn.setRequestProperty("Content-Type", "application/json");
            conn.setRequestProperty("Content-Length", String.valueOf(postDataBytes.length));
            conn.setDoOutput(true);
            conn.getOutputStream().write(postDataBytes);

            in = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));

            StringBuilder sb = new StringBuilder();
            for (int c; (c = in.read()) >= 0;)
                sb.append((char)c);
            response = sb.toString();
           //System.out.println(response);
        } catch (IOException e) {
            LOG.error(null,e);
        }finally {
            if(in != null){
                try {
                    in.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
        return response;
    }

    /**
     * 向指定 URL 发送POST方法的请求
     * @param url  发送请求的 URL
     * @param param  请求参数，请求参数应该是 name1=value1&name2=value2 的形式。
     * @return  所代表远程资源的响应结果
     */
    public static void sendPostWithoutReturn(String url, String param) {
        PrintWriter out = null;
        BufferedReader in = null;
        String result = "";
        try {
            URL realUrl = new URL(url);
            // 打开和URL之间的连接
            HttpURLConnection conn = (HttpURLConnection )realUrl.openConnection();
            // 设置通用的请求属性
            conn.setRequestProperty("Content-Type","application/json");
            conn.setRequestProperty("accept", "*/*");
            conn.setRequestProperty("connection", "Keep-Alive");
            conn.setRequestProperty("user-agent",
                    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");

            //根据需求设置读超时的时间
            conn.setReadTimeout(1000);
            // 发送POST请求必须设置如下两行
            conn.setDoOutput(true);
            conn.setDoInput(true);
            // 获取URLConnection对象对应的输出流
            out = new PrintWriter(conn.getOutputStream());
            // 发送请求参数
            out.print(param);
            // flush输出流的缓冲
            out.flush();
            // 定义BufferedReader输入流来读取URL的响应
            if (conn.getResponseCode() == 200) {
                System.out.println("连接成功,传送数据...");
            } else {
                System.out.println("连接失败,错误代码:"+conn.getResponseCode());
            }
        }
        catch (IOException e) {
            LOG.info("发送POST请求出现异常！" + (url+param),e);
        }
        //使用finally块来关闭输出流、输入流
        finally{
            try{
                if(out!=null){
                    out.close();
                }
                in.close();

            }
            catch(Exception ex){
                ex.printStackTrace();
            }
        }
    }
}

8、netb/db/DBCommon—mysql的连接、关闭基础类

package com.hsiehchou.common.netb.db;

import com.hsiehchou.common.config.ConfigUtil;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.sql.*;
import java.util.Properties;

public class DBCommon {

    private static Logger LOG = LoggerFactory.getLogger(DBCommon.class);
    private static String MYSQL_PATH = "common/mysql.properties";
    private static Properties properties = ConfigUtil.getInstance().getProperties(MYSQL_PATH);

    private static Connection conn ;
    private DBCommon(){}

    public static void main(String[] args) {
        System.out.println(properties);
        Connection xz_bigdata = DBCommon.getConn("test");
        System.out.println(xz_bigdata);
    }

    //TODO  配置文件
    private static final String JDBC_DRIVER = "com.mysql.jdbc.Driver";
    private static final String USER_NAME = properties.getProperty("user");
    private static final String PASSWORD = properties.getProperty("password");
    private static final String IP = properties.getProperty("db_ip");
    private static final String PORT = properties.getProperty("db_port");
    private static final String DB_CONFIG = "?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull&autoReconnect=true&failOverReadOnly=false";

    static {
        try {
            Class.forName(JDBC_DRIVER);
        } catch (ClassNotFoundException e) {
            LOG.error(null, e);
        }
    }

    /**
     * 获取数据库连接
     * @param dbName
     * @return
     */
    public static Connection getConn(String dbName) {
        Connection conn = null;
        String  connstring = "jdbc:mysql://"+IP+":"+PORT+"/"+dbName+DB_CONFIG;
        try {
            conn = DriverManager.getConnection(connstring, USER_NAME, PASSWORD);
        } catch (SQLException e) {
            e.printStackTrace();
            LOG.error(null, e);
        }
        return conn;
    }

    /**
     * @param url eg:"jdbc:oracle:thin:@172.16.1.111:1521:d406"
     * @param driver eg:"oracle.jdbc.driver.OracleDriver"
     * @param user eg:"ucase"
     * @param password eg:"ucase123"
     * @return
     * @throws ClassNotFoundException
     * @throws SQLException
     */
    public static Connection getConn(String url, String driver, String user,
                                     String password) throws ClassNotFoundException, SQLException{
        Class.forName(driver);
        conn = DriverManager.getConnection(url, user, password);
        return  conn;
    }

    public static void close(Connection conn){
        try {
            if( conn != null ){
                conn.close();
            }
        } catch (SQLException e) {
            LOG.error(null,e);
        }
    }

    public static void close(Statement statement){
        try {
            if( statement != null ){
                statement.close();
            }
        } catch (SQLException e) {
            LOG.error(null,e);
        }
    }

    public static void close(Connection conn,PreparedStatement statement){
        try {
            if( conn != null ){
                conn.close();
            }
            if( statement != null ){
                statement.close();
            }
        } catch (SQLException e) {
            LOG.error(null,e);
        }
    }

    public static void close(Connection conn,Statement statement,ResultSet resultSet) throws SQLException{

        if( resultSet != null ){
            resultSet.close();
        }
        if( statement != null ){
            statement.close();
        }
        if( conn != null ){
            conn.close();
        }
    }
}

9、project/datatype/DataTypeProperties.java

package com.hsiehchou.common.project.datatype;

import com.hsiehchou.common.config.ConfigUtil;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.*;

public class DataTypeProperties {
    private static final Logger logger = LoggerFactory.getLogger(DataTypeProperties.class);

    private static final String DATA_PATH = "common/datatype.properties";

    public static Map<String,ArrayList<String>> dataTypeMap = null;

    static {
        Properties properties = ConfigUtil.getInstance().getProperties(DATA_PATH);
        dataTypeMap = new HashMap<>();
        Set<Object> keys = properties.keySet();
        keys.forEach(key->{
            String[] split = properties.getProperty(key.toString()).split(",");
            dataTypeMap.put(key.toString(),new ArrayList<>(Arrays.asList(split)));
        });
    }

    public static void main(String[] args) {
        Map<String, ArrayList<String>> dataTypeMap = DataTypeProperties.dataTypeMap;
        System.out.println(dataTypeMap.toString());
    }
}

10、regex/Validation.java—验证工具类

package com.hsiehchou.common.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * 验证工具类
 */
public class Validation {
    // ------------------常量定义
    /**
     * Email正则表达式=
     * "^([a-z0-9A-Z]+[-|\\.]?)+[a-z0-9A-Z]@([a-z0-9A-Z]+(-[a-z0-9A-Z]+)?\\.)+[a-zA-Z]{2,}$"
     * ;
     */
    // public static final String EMAIL =
    // "^([a-z0-9A-Z]+[-|\\.]?)+[a-z0-9A-Z]@([a-z0-9A-Z]+(-[a-z0-9A-Z]+)?\\.)+[a-zA-Z]{2,}$";;
    public static final String EMAIL = "\\w+(\\.\\w+)*@\\w+(\\.\\w+)+";

    /**
     * 电话号码正则表达式=
     * (^(\d{2,4}[-_－—]?)?\d{3,8}([-_－—]?\d{3,8})?([-_－—]?\d{1,7})?$)|
     * (^0?1[35]\d{9}$)
     */
    public static final String PHONE = "(^(\\d{2,4}[-_－—]?)?\\d{3,8}([-_－—]?\\d{3,8})?([-_－—]?\\d{1,7})?$)|(^0?1[35]\\d{9}$)";

    /**
     * 手机号码正则表达式=^(13[0-9]|15[0-9]|18[0-9])\d{8}$
     */
    public static final String MOBILE = "^((13[0-9])|(14[5-7])|(15[^4])|(17[0-8])|(18[0-9]))\\d{8}$";

    /**
     * Integer正则表达式 ^-?(([1-9]\d*$)|0)
     */
    public static final String INTEGER = "^-?(([1-9]\\d*$)|0)";

    /**
     * 正整数正则表达式 >=0 ^[1-9]\d*|0$
     */
    public static final String INTEGER_NEGATIVE = "^[1-9]\\d*|0$";

    /**
     * 负整数正则表达式 <=0 ^-[1-9]\d*|0$
     */
    public static final String INTEGER_POSITIVE = "^-[1-9]\\d*|0$";

    /**
     * Double正则表达式 ^-?([1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0)$
     */
    public static final String DOUBLE = "^-?([1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*|0?\\.0+|0)$";

    /**
     * 正Double正则表达式 >=0 ^[1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0$　
     */
    public static final String DOUBLE_NEGATIVE = "^[1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*|0?\\.0+|0$";

    /**
     * 负Double正则表达式 <= 0 ^(-([1-9]\d*\.\d*|0\.\d*[1-9]\d*))|0?\.0+|0$
     */
    public static final String DOUBLE_POSITIVE = "^(-([1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*))|0?\\.0+|0$";

    /**
     * 年龄正则表达式 ^(?:[1-9][0-9]?|1[01][0-9]|120)$ 匹配0-120岁
     */
    public static final String AGE = "^(?:[1-9][0-9]?|1[01][0-9]|120)$";

    /**
     * 邮编正则表达式 [0-9]\d{5}(?!\d) 国内6位邮编
     */
    public static final String CODE = "[0-9]\\d{5}(?!\\d)";

    /**
     * 匹配由数字、26个英文字母或者下划线组成的字符串 ^\w+$
     */
    public static final String STR_ENG_NUM_ = "^\\w+$";

    /**
     * 匹配由数字和26个英文字母组成的字符串 ^[A-Za-z0-9]+$
     */
    public static final String STR_ENG_NUM = "^[A-Za-z0-9]+";

    /**
     * 匹配由26个英文字母组成的字符串 ^[A-Za-z]+$
     */
    public static final String STR_ENG = "^[A-Za-z]+$";

    /**
     * 过滤特殊字符串正则 regEx=
     * "[`~!@#$%^&*()+=|{}':;',\\[\\].<>/?~！@#￥%……&*（）——+|{}【】‘；：”“’。，、？]";
     */
    public static final String STR_SPECIAL = "[`~!@#$%^&*()+=|{}':;',\\[\\].<>/?~！@#￥%……&*（）——+|{}【】‘；：”“’。，、？]";

    /***
     * 日期正则 支持： YYYY-MM-DD YYYY/MM/DD YYYY_MM_DD YYYYMMDD YYYY.MM.DD的形式
     */
    public static final String DATE_ALL = "((^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(10|12|0?[13578])([-\\/\\._]?)(3[01]|[12][0-9]|0?[1-9])$)"
            + "|(^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(11|0?[469])([-\\/\\._]?)(30|[12][0-9]|0?[1-9])$)"
            + "|(^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(0?2)([-\\/\\._]?)(2[0-8]|1[0-9]|0?[1-9])$)|(^([2468][048]00)([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([3579][26]00)"
            + "([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)"
            + "|(^([1][89][0][48])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([2-9][0-9][0][48])([-\\/\\._]?)"
            + "(0?2)([-\\/\\._]?)(29)$)"
            + "|(^([1][89][2468][048])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([2-9][0-9][2468][048])([-\\/\\._]?)(0?2)"
            + "([-\\/\\._]?)(29)$)|(^([1][89][13579][26])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|"
            + "(^([2-9][0-9][13579][26])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$))";

    /***
     * 日期正则 支持： YYYY-MM-DD
     */
    public static final String DATE_FORMAT1 = "(([0-9]{3}[1-9]|[0-9]{2}[1-9][0-9]{1}|[0-9]{1}[1-9][0-9]{2}|[1-9][0-9]{3})-(((0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01]))|((0[469]|11)-(0[1-9]|[12][0-9]|30))|(02-(0[1-9]|[1][0-9]|2[0-8]))))|((([0-9]{2})(0[48]|[2468][048]|[13579][26])|((0[48]|[2468][048]|[3579][26])00))-02-29)";


    /**
     * URL正则表达式 匹配 http www ftp
     */
    public static final String URL = "^(http|www|ftp|)?(://)?(\\w+(-\\w+)*)(\\.(\\w+(-\\w+)*))*((:\\d+)?)(/(\\w+(-\\w+)*))*(\\.?(\\w)*)(\\?)?"
            + "(((\\w*%)*(\\w*\\?)*(\\w*:)*(\\w*\\+)*(\\w*\\.)*(\\w*&)*(\\w*-)*(\\w*=)*(\\w*%)*(\\w*\\?)*"
            + "(\\w*:)*(\\w*\\+)*(\\w*\\.)*"
            + "(\\w*&)*(\\w*-)*(\\w*=)*)*(\\w*)*)$";

    /**
     * 身份证正则表达式
     */
    public static final String IDCARD = "((11|12|13|14|15|21|22|23|31|32|33|34|35|36|37|41|42|43|44|45|46|50|51|52|53|54|61|62|63|64|65)[0-9]{4})"
            + "(([1|2][0-9]{3}[0|1][0-9][0-3][0-9][0-9]{3}"
            + "[Xx0-9])|([0-9]{2}[0|1][0-9][0-3][0-9][0-9]{3}))";

    /**
     * 机构代码
     */
    public static final String JIGOU_CODE = "^[A-Z0-9]{8}-[A-Z0-9]$";

    /**
     * 匹配数字组成的字符串 ^[0-9]+$
     */
    public static final String STR_NUM = "^[0-9]+$";

    // //------------------验证方法
    /**
     * 判断字段是否为空 符合返回ture
     * @param str
     * @return boolean
     */
    public static synchronized boolean StrisNull(String str) {
        return null == str || str.trim().length() <= 0 ? true : false;
    }

    /**
     * 判断字段是非空 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean StrNotNull(String str) {
        return !StrisNull(str);
    }

    /**
     * 字符串null转空
     * @param str
     * @return boolean
     */
    public static String nulltoStr(String str) {
        return StrisNull(str) ? "" : str;
    }

    /**
     * 字符串null赋值默认值
     * @param str  目标字符串
     * @param defaut  默认值
     * @return  String
     */
    public static String nulltoStr(String str, String defaut) {
        return StrisNull(str) ? defaut : str;
    }

    /**
     * 判断字段是否为Email 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isEmail(String str) {
        return Regular(str, EMAIL);
    }

    /**
     * 判断是否为电话号码 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isPhone(String str) {
        return Regular(str, PHONE);
    }

    /**
     * 判断是否为手机号码 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isMobile(String str) {
        return RegularSJHM(str, MOBILE);
    }

    /**
     * 判断是否为Url 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isUrl(String str) {
        return Regular(str, URL);
    }

    /**
     * 判断字段是否为数字 正负整数 正负浮点数 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isNumber(String str) {
        return Regular(str, DOUBLE);
    }

    /**
     * 判断字段是否为INTEGER 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isInteger(String str) {
        return Regular(str, INTEGER);
    }

    /**
     * 判断字段是否为正整数正则表达式 >=0 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isINTEGER_NEGATIVE(String str) {
        return Regular(str, INTEGER_NEGATIVE);
    }

    /**
     * 判断字段是否为负整数正则表达式 <=0 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isINTEGER_POSITIVE(String str) {
        return Regular(str, INTEGER_POSITIVE);
    }

    /**
     * 判断字段是否为DOUBLE 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isDouble(String str) {
        return Regular(str, DOUBLE);
    }

    /**
     * 判断字段是否为正浮点数正则表达式 >=0 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isDOUBLE_NEGATIVE(String str) {
        return Regular(str, DOUBLE_NEGATIVE);
    }

    /**
     * 判断字段是否为负浮点数正则表达式 <=0 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isDOUBLE_POSITIVE(String str) {
        return Regular(str, DOUBLE_POSITIVE);
    }

    /**
     * 判断字段是否为日期 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isDate(String str) {
        return Regular(str, DATE_ALL);
    }

    /**
     * 验证2010-12-10
     * @param str
     * @return
     */
    public static boolean isDate1(String str) {
        return Regular(str, DATE_FORMAT1);
    }

    /**
     * 判断字段是否为年龄 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isAge(String str) {
        return Regular(str, AGE);
    }

    /**
     * 判断字段是否超长 字串为空返回fasle, 超过长度{leng}返回ture 反之返回false
     * @param str
     * @param leng
     * @return boolean
     */
    public static boolean isLengOut(String str, int leng) {
        return StrisNull(str) ? false : str.trim().length() > leng;
    }

    /**
     * 判断字段是否为身份证 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isIdCard(String str) {
        if (StrisNull(str))
            return false;
        if (str.trim().length() == 15 || str.trim().length() == 18) {
            return Regular(str, IDCARD);
        } else {
            return false;
        }
    }

    /**
     * 判断字段是否为邮编 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isCode(String str) {
        return Regular(str, CODE);
    }

    /**
     * 判断字符串是不是全部是英文字母
     * @param str
     * @return boolean
     */
    public static boolean isEnglish(String str) {
        return Regular(str, STR_ENG);
    }

    /**
     * 判断字符串是不是全部是英文字母+数字
     * @param str
     * @return boolean
     */
    public static boolean isENG_NUM(String str) {
        return Regular(str, STR_ENG_NUM);
    }

    /**
     * 判断字符串是不是全部是英文字母+数字+下划线
     * @param str
     * @return boolean
     */
    public static boolean isENG_NUM_(String str) {
        return Regular(str, STR_ENG_NUM_);
    }

    /**
     * 过滤特殊字符串 返回过滤后的字符串
     * @param str
     * @return boolean
     */
    public static String filterStr(String str) {
        Pattern p = Pattern.compile(STR_SPECIAL);
        Matcher m = p.matcher(str);
        return m.replaceAll("").trim();
    }

    /**
     * 校验机构代码格式
     * @return
     */
    public static boolean isJigouCode(String str) {
        return Regular(str, JIGOU_CODE);
    }

    /**
     * 判断字符串是不是数字组成
     * @param str
     * @return boolean
     */
    public static boolean isSTR_NUM(String str) {
        return Regular(str, STR_NUM);
    }

    /**
     * 匹配是否符合正则表达式pattern 匹配返回true
     * @param str 匹配的字符串
     * @param pattern 匹配模式
     * @return boolean
     */
    private static boolean Regular(String str, String pattern) {
        if (null == str || str.trim().length() <= 0)
            return false;
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(str);
        return m.matches();
    }

    /**
     * 匹配是否符合正则表达式pattern 匹配返回true
     * @param str 匹配的字符串
     * @param pattern 匹配模式
     * @return boolean
     */
    private static boolean RegularSJHM(String str, String pattern) {
        if (null == str || str.trim().length() <= 0){
            return false;
        }
        if(str.contains("+86")){
            str=str.replace("+86","");
        }
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(str);
        return m.matches();
    }

    /**
     * description:匹配yyyyMMddHHmmss格式时间
     * @param time
     * @return boolean 2016-7-19 下午5:13:25 by 
     */
    public static final String yyyyMMddHHmmss = "[0-9]{14}";

    public static boolean isyyyyMMddHHmmss(String time) {
        if (time == null) {
            return false;
        }
        boolean bool = time.matches(yyyyMMddHHmmss);
        return bool;
    }

    /**
     * description:匹配yyyyMMddHHmmss格式时间
     * @param time
     * @return boolean 2016-7-19 下午5:13:25 by 
     */
    public static final String isMac = "^[A-F0-9]{2}(-[A-F0-9]{2}){5}$";

    public static boolean isMac(String mac) {
        if (mac == null) {
            return false;
        }
        boolean bool = mac.matches(isMac);
        return bool;
    }

    /**
     * description:匹配yyyyMMddHHmmss格式时间
     * @param time
     * @return boolean 2016-7-19 下午5:13:25 by 
     */
    public static final String longtime = "[0-9]{10}";

    public static boolean isTimestamp(String timestamp) {
        if (timestamp == null) {
            return false;
        }
        boolean bool = timestamp.matches(longtime);
        return bool;
    }

    /**
     * 判断字段是否为datatype 符合返回ture
     * @param str
     * @return boolean
     */
    public static final String DATATYPE = "^\\d{7}$";
    public static boolean isDATATYPE(String str) {
        return Regular(str, DATATYPE);
    }


    /**
     * 判断字段是否为QQ 符合返回ture
     * @param str
     * @return boolean
     */
    public static final String QQ = "^\\d{5,15}$";
    public static boolean isQQ(String str) {
        return Regular(str, QQ);
    }


    /**
     * 判断字段是否为IMSI 符合返回ture
     * @param str
     * @return boolean
     */
    //public static final String IMSI = "^4600[0,1,2,3,4,5,6,7,9]\\d{10}|(46011|46020)\\d{10}$";
    public static final String IMSI = "^[1-9][0-9][0-9]0[0,1,2,3,4,5,6,7,9]\\d{10}|[1-9][0-9][0-9](11|20)\\d{10}$";
    public static boolean isIMSI(String str) {
        return Regular(str, IMSI);
    }

    /**
     * 判断字段是否为IMEI 符合返回ture
     * @param str
     * @return boolean
     */
    public static final String IMEI = "^\\d{8}$|^[a-fA-F0-9]{14}$|^\\d{15}$";
    public static boolean isIMEI(String str) {return Regular(str, IMEI);}

    /**
     * 判断字段是否为CAPTURETIME 符合返回ture
     * @param str
     * @return boolean
     */


    public static final String CAPTURETIME = "^\\d{10}|(20[0-9][0-9])\\d{10}$";
    public static boolean isCAPTURETIME(String str) {return Regular(str, CAPTURETIME);}

    /**
     * description:检测认证类型
     * @param auth
     * @return boolean
     */
    public static final String AUTH_TYPE = "^\\d{7}$";
    public static boolean isAUTH_TYPE(String str) {return Regular(str, CAPTURETIME);}

    /**
     * description:检测FIRM_CODE
     * @param auth
     * @return boolean
     */
    public static final String FIRM_CODE = "^\\d{9}$";
    public static boolean isFIRM_CODE(String str) {return Regular(str, FIRM_CODE);}

    /**
     * description:检测经度
     * @param auth
     * @return boolean
     */
    public static final String LONGITUDE = "^-?(([1-9]\\d?)|(1[0-7]\\d)|180)(\\.\\d{1,8})?$";

    //public static final String LONGITUDE ="^([-]?(\\d|([1-9]\\d)|(1[0-7]\\d)|(180))(\\.\\d*)\\,[-]?(\\d|([1-8]\\d)|(90))(\\.\\d*))$";
    public static boolean isLONGITUDE(String str) {return Regular(str, LONGITUDE);}

    /**
     * description:检测纬度
     * @param auth
     * @return boolean
     */
    public static final String LATITUDE = "^-?(([1-8]\\d?)|([1-8]\\d)|90)(\\.\\d{1,8})?$";
    public static boolean isLATITUDE(String str) {return Regular(str, LATITUDE);}

    public static void main(String[] args) {
        boolean bool = isLATITUDE("26.0615854");
        System.out.println(bool);
    }
}

11、thread/ThreadPoolManager.java—线程池管理器单例

package com.hsiehchou.common.thread;

import java.io.Serializable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

/**
 *     线程池管理器单例
 *  默认创建   ewCachedThreadPool ：创建一个可缓存的线程池
 *  可通过指定线程的数量来创建：newFixedThreadPool  ： 创建固定大小的线程池
 */
public class ThreadPoolManager implements Serializable {

    private static final long serialVersionUID = 1465361469484903956L;
    public static final ThreadPoolManager threadPoolManager =  new ThreadPoolManager();

    private static ThreadPoolManager tpm;

    private transient ExecutorService newCachedThreadPool;
    private transient ExecutorService newFixedThreadPool;

    private int poolCapacity;

    private ThreadPoolManager(){
        if( newCachedThreadPool == null )
            newCachedThreadPool = Executors.newCachedThreadPool();
    }

    @Deprecated
    public static ThreadPoolManager getInstance(){
        if( tpm == null ){
            synchronized(ThreadPoolManager.class){
            if( tpm == null )
                tpm =  new ThreadPoolManager();
            }
        }
        return tpm;
    }

    /**
      * 返回 newCachedThreadPool
     */
    public ExecutorService getExecutorService(){
        if( newCachedThreadPool == null ){
            synchronized(ThreadPoolManager.class){
                if( newCachedThreadPool == null )
                    newCachedThreadPool = Executors.newCachedThreadPool();
            }
        }
        return newCachedThreadPool;
    }

    /** 
      * 返回 newFixedThreadPool
     */
    public ExecutorService getExecutorService(int poolCapacity){
        return getExecutorService(poolCapacity, false);
    }

    /**
      * 返回 newFixedThreadPool
     */
    public synchronized ExecutorService getExecutorService(int poolCapacity, boolean closeOld){
        if(newFixedThreadPool == null || (this.poolCapacity != poolCapacity)){
            if(newFixedThreadPool != null && closeOld){
                newFixedThreadPool.shutdown();
            }
            newFixedThreadPool = Executors.newFixedThreadPool(poolCapacity);
            this.poolCapacity = poolCapacity;
        }
        return newFixedThreadPool;
    }
}

12、time/TimeTranstationUtils.java—时间转换工具类

package com.hsiehchou.common.time;

import org.apache.commons.lang.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;

/**
 * Description: 时间转换工具类
 */
public class TimeTranstationUtils {

    private static final Logger logger = LoggerFactory.getLogger(TimeTranstationUtils.class);

/*    private static SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
    private static SimpleDateFormat sdFormatternew = new SimpleDateFormat("yyyyMMddHH");
    private static SimpleDateFormat sdFormatter1 = new SimpleDateFormat("yyyy-MM-dd");
    private static SimpleDateFormat sdFormatter2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    private static SimpleDateFormat sdFormatter3 = new SimpleDateFormat("yyyyMMdd");*/

    private static Date nowTime;

    public static String Date2yyyyMMddHHmmss() {
        SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
        nowTime = new Date(System.currentTimeMillis());
        String time = sdFormatter.format(nowTime);
        return time;
    }

    public static String Date2yyyyMMddHHmmss(long timestamp) {
        SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
        nowTime = new Date(timestamp);
        String time = sdFormatter.format(nowTime);
        return time;
    }

    public static String Date2yyyyMMdd(long timestamp) {
        SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMdd");
        nowTime = new Date(timestamp);
        String time = sdFormatter.format(nowTime);
        return time;
    }

    public static String Date2yyyyMMddHH(String str) {
        SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
        SimpleDateFormat sdFormatternew = new SimpleDateFormat("yyyyMMddHH");
        try {
            nowTime = sdFormatter.parse(str);
        } catch (ParseException e) {
            e.printStackTrace();
        }
        String time = sdFormatternew.format(nowTime);
        return time;
    }

    public static String Date2yyyy_MM_dd() {
        SimpleDateFormat sdFormatter1 = new SimpleDateFormat("yyyy-MM-dd");
        nowTime = new Date(System.currentTimeMillis());
        String time = sdFormatter1.format(nowTime);
        return time;
    }

    public static String Date2yyyy_MM_dd_HH_mm_ss() {
        SimpleDateFormat sdFormatter2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        nowTime = new Date(System.currentTimeMillis());
        String time = sdFormatter2.format(nowTime);
        return time;
    }

    public static String Date2yyyyMMdd() {
        SimpleDateFormat sdFormatter3 = new SimpleDateFormat("yyyyMMdd");
        nowTime = new Date(System.currentTimeMillis());
        String time = sdFormatter3.format(nowTime);
        return time;
    }

    public static String Date2yyyyMMdd(String str) {
        SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
        SimpleDateFormat sdFormatter3 = new SimpleDateFormat("yyyyMMdd");
        try {
            nowTime = sdFormatter.parse(str);
        } catch (ParseException e) {
            e.printStackTrace();
        }
        String time = sdFormatter3.format(nowTime);
        return time;
    }

    public static Long Date2yyyyMMddHHmmssToLong() {
        return System.currentTimeMillis() / 1000;
    }

    public static String long2date(String capturetime){
        SimpleDateFormat sdf= new SimpleDateFormat("yyyyMMdd");
        //前面的lSysTime是秒数，先乘1000得到毫秒数，再转为java.util.Date类型
        Date dt = new Date(Long.valueOf(capturetime) * 1000);
        String sDateTime = sdf.format(dt);  //得到精确到秒的表示：08/31/2006 21:08:00
        return sDateTime;
    }

    public static Long yyyyMMddHHmmssToLong(String time) {
        SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
        if (StringUtils.isBlank(time)) {
            return 0L;
        } else {
            boolean isNum = time.matches("[0-9]+");
            if (isNum) {
                long long1 = 0;
                try {
                    long1 = sdFormatter.parse(time).getTime();
                } catch (ParseException e) {
                    logger.error(time + "时间转换为long错误" + isNum);
                    return 0L;
                }
                return long1 / 1000;
            }
        }
        return 0L;
    }

    public static Date yyyyMMddHHmmssToDate(String time) {
        SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
        if (StringUtils.isBlank(time)) {
            return new Date();
        } else {
            boolean isNum = time.matches("[0-9]+");
            if (isNum) {
                Date date = null;
                try {
                    date = sdFormatter.parse(time);
                } catch (ParseException e) {
                    logger.error(time + "时间转换为date错误" + isNum, e);
                    System.out.println(time);
                    System.out.println(isNum);
                    e.printStackTrace();
                }
                return date;
            }
        }
        return new Date();
    }

    public static Date yyyyMMddHHmmssToDate() {
        Date date = null;
        SimpleDateFormat sdFormatter2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        try {
            date = sdFormatter2.parse(Date2yyyy_MM_dd_HH_mm_ss());
        } catch (ParseException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return date;
    }

    public static java.sql.Date strToDate(String strDate) {
        String str = strDate;
        SimpleDateFormat format = new SimpleDateFormat("yyyy-mm-dd");
        Date d = null;
        try {
            d = format.parse(str);
        } catch (Exception e) {
            e.printStackTrace();
        }
        java.sql.Date date = new java.sql.Date(d.getTime());
        return date;
    }

    public static Long str2Long(String str){
        if(!StringUtils.isBlank(str)){
            return Long.valueOf(str);
        }else{
            return 0L;
        }
    }

    public static Double str2Double(String str){
        if(!StringUtils.isBlank(str)){
            return Double.valueOf(str);
        }else{
            return 0.0;
        }
    }

    public static HashMap<String,Object> mapString2Long(Map<String,String> map, String key, HashMap<String,Object> objectMap) {
        String logouttime = map.get(key);
        if (!StringUtils.isBlank(logouttime)) {
            objectMap.put(key, Long.valueOf(logouttime));
        } else {
            objectMap.put(key, 0L);
        }
        return objectMap;
    }

    public static void main(String[] args) throws InterruptedException {
        System.out.println(long2date("1463487992"));
    }
}

四、Resources开发

xz_bigdata_resources结构

xz_bigdata_resources整体结构

注意：这里的resources要选中右键，选择Make Directory as，选择下级的Resources Root，变成Resources配置源文件，项目可以任意调用。

1、resources下面

log4j2.properties

log4j.rootLogger = error,stdout,D,E

log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target = System.out
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern = [%-5p] %d{yyyy-MM-dd HH:mm:ss,SSS} method:%l%n%m%n

log4j.appender.D = org.apache.log4j.DailyRollingFileAppender
log4j.appender.D.File = F://logs/log.log
log4j.appender.D.Append = true
log4j.appender.D.Threshold = DEBUG 
log4j.appender.D.layout = org.apache.log4j.PatternLayout
log4j.appender.D.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss}  [ %t:%r ] - [ %p ]  %m%n

log4j.appender.E = org.apache.log4j.DailyRollingFileAppender
log4j.appender.E.File =F://logs/error.log 
log4j.appender.E.Append = true
log4j.appender.E.Threshold = ERROR 
log4j.appender.E.layout = org.apache.log4j.PatternLayout
log4j.appender.E.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss}  [ %t:%r ] - [ %p ]  %m%n

2、common

datatype.properties

# base = datatype,idcard,name,age,collecttime,imei

# wechat = datatype,wechat,phone,collecttime,imei

wechat = imei,imsi,longitude,latitude,phone_mac,device_mac,device_number,collect_time,username,phone,object_username,send_message,accept_message,message_time

mail = imei,imsi,longitude,latitude,phone_mac,device_mac,device_number,collect_time,send_mail,send_time,accept_mail,accept_time,mail_content,mail_type

qq = imei,imsi,longitude,latitude,phone_mac,device_mac,device_number,collect_time,username,phone,object_username,send_message,accept_message,message_time

mysql.properties

db_ip = 192.168.116.201
db_port = 3306
user = root
password = root

3、es

es_cluster.properties

es.cluster.name=xz_es
es.cluster.nodes = hadoop1,hadoop2,hadoop3
es.cluster.nodes1 = hadoop1
es.cluster.nodes2 = hadoop2
es.cluster.nodes3 = hadoop3

es.cluster.tcp.port = 9300
es.cluster.http.port = 9200

mapping/base.json

{
  "_source": {
    "enabled": true
  },
  "properties": {
    "datatype":{"type": "keyword"},
    "idcard":{"type": "keyword"},
    "name":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "age":{"type": "long"},
    "collecttime":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "imei":{"type": "keyword"}
  }
}

mapping/fieldmapping.properties

tables = wechat,mail,qq

wechat.imei = string
wechat.imsi = string
wechat.longitude = double
wechat.latitude = double
wechat.phone_mac = string
wechat.device_mac = string
wechat.device_number = string
wechat.collect_time = long
wechat.username = string
wechat.phone = string
wechat.object_username = string
wechat.send_message = string
wechat.accept_message = string
wechat.message_time = long
wechat.id = string
wechat.table = string
wechat.filename = string
wechat.absolute_filename  = string


mail.imei = string
mail.imsi = string
mail.longitude = double
mail.latitude = double
mail.phone_mac = string
mail.device_mac = string
mail.device_number = string
mail.collect_time = long
mail.send_mail = string
mail.send_time = long
mail.accept_mail = string
mail.accept_time = long
mail.mail_content = string
mail.mail_type = string
mail.id = string
mail.table = string
mail.filename = string
mail.absolute_filename  = string

qq.imei = string
qq.imsi = string
qq.longitude = double
qq.latitude = double
qq.phone_mac = string
qq.device_mac = string
qq.device_number = string
qq.collect_time = long
qq.username = string
qq.phone = string
qq.object_username = string
qq.send_message = string
qq.accept_message = string
qq.message_time = long
qq.id = string
qq.table = string
qq.filename = string
qq.absolute_filename  = string

mapping/mail.json

{
  "_source": {
    "enabled": true
  },
  "properties": {
    "imei":{"type": "keyword"},
    "imsi":{"type": "keyword"},
    "longitude":{"type": "double"},
    "latitude":{"type": "double"},
    "phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "collect_time":{"type": "long"},
    "send_mail":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "send_time":{"type": "long"},
    "accept_mail":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "accept_time":{"type": "long"},
    "mail_content":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "mail_type":{"type": "keyword"},
     "id":{"type": "keyword"},
    "table":{"type": "keyword"},
    "filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
  }
}

mapping/qq.json

{
  "_source": {
    "enabled": true
  },
  "properties": {
    "imei":{"type": "keyword"},
    "imsi":{"type": "keyword"},
    "longitude":{"type": "double"},
    "latitude":{"type": "double"},
    "phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "collect_time":{"type": "long"},
    "username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "phone":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "object_username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "send_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "accept_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "message_time":{"type": "long"},
    "id":{"type": "keyword"},
    "table":{"type": "keyword"},
    "filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
  }
}

mapping/test.json

{
  "_source": {
    "enabled": true
  },
  "properties": {
    "id":{"type": "keyword"},
    "source":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "target":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "library_id":{"type": "long"},
    "source_sign":{"type": "keyword"},
    "target_sign":{"type": "keyword"},
    "create_time":{"type": "long"},
    "create_user_id":{"type": "keyword"},
    "is_audit":{"type": "long"},
    "is_del":{"type": "long"},
    "last_modify_user_id":{"type": "keyword"},
    "last_modify_time":{"type": "long"},
    "init_version":{"type": "long"},
    "version":{"type": "long"},
    "score":{"type": "keyword"},
    "level":{"type": "keyword"},
    "example":{"type": "keyword"},
    "conflict":{"type": "keyword"},
    "srcLangId":{"type": "long"},
    "srcLangCN":{"type": "keyword"},
    "tarLangId":{"type": "long"},
    "tarLangCN":{"type": "keyword"},
    "docId":{"type": "keyword"},
    "source_simhash":{"type": "keyword"},
    "sentence_id":{"type": "long"},
    "section_id":{"type": "long"},
    "type":{"type": "long"},
    "industry":{"type": "long"},
    "industry_name":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "querycount":{"type": "long"},
    "reviser":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "comment":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
  }
}

mapping/wechat.json

{
  "_source": {
    "enabled": true
  },
  "properties": {
    "imei":{"type": "keyword"},
    "imsi":{"type": "keyword"},
    "longitude":{"type": "double"},
    "latitude":{"type": "double"},
    "phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "collect_time":{"type": "long"},
    "username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "phone":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "object_username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "send_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "accept_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "message_time":{"type": "long"},
    "id":{"type": "keyword"},
    "table":{"type": "keyword"},
    "filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
  }
}

4、flume

datatype.properties

flume-config.properties

#kafka topic
kafkatopic=test100

validation.properties

# 文件名验证开关
FILENAME_VALIDATION=1

# DATATYPE转换开关
DATATYPE_TRANSACTION=1

# 经纬度验证开关
LONGLAIT_VALIDATION=1

# 是否入错误数据到ES
ERROR_ES=1

5、hadoop

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop1:8020</value>
  </property>
  <property>
    <name>fs.trash.interval</name>
    <value>1</value>
  </property>
  <property>
    <name>io.compression.codecs</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec</value>
  </property>
  <property>
    <name>hadoop.security.authentication</name>
    <value>simple</value>
  </property>
  <property>
    <name>hadoop.security.authorization</name>
    <value>false</value>
  </property>
  <property>
    <name>hadoop.rpc.protection</name>
    <value>authentication</value>
  </property>
  <property>
    <name>hadoop.security.auth_to_local</name>
    <value>DEFAULT</value>
  </property>
  <property>
    <name>hadoop.proxyuser.oozie.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.oozie.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.mapred.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.mapred.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.flume.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.flume.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.HTTP.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.HTTP.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hue.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hue.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.httpfs.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.httpfs.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hdfs.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hdfs.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.yarn.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.yarn.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.security.group.mapping</name>
    <value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value>
  </property>
  <property>
    <name>hadoop.security.instrumentation.requires.admin</name>
    <value>false</value>
  </property>
  <property>
    <name>net.topology.script.file.name</name>
    <value>/etc/hadoop/conf.cloudera.yarn/topology.py</value>
  </property>
  <property>
    <name>io.file.buffer.size</name>
    <value>65536</value>
  </property>
  <property>
    <name>hadoop.ssl.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>hadoop.ssl.require.client.cert</name>
    <value>false</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.keystores.factory.class</name>
    <value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.server.conf</name>
    <value>ssl-server.xml</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.client.conf</name>
    <value>ssl-client.xml</value>
    <final>true</final>
  </property>
</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///dfs/nn</value>
  </property>
  <property>
    <name>dfs.namenode.servicerpc-address</name>
    <value>hadoop1:8022</value>
  </property>
  <property>
    <name>dfs.https.address</name>
    <value>hadoop1:50470</value>
  </property>
  <property>
    <name>dfs.https.port</name>
    <value>50470</value>
  </property>
  <property>
    <name>dfs.namenode.http-address</name>
    <value>hadoop1:50070</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.blocksize</name>
    <value>134217728</value>
  </property>
  <property>
    <name>dfs.client.use.datanode.hostname</name>
    <value>false</value>
  </property>
  <property>
    <name>fs.permissions.umask-mode</name>
    <value>022</value>
  </property>
  <property>
    <name>dfs.namenode.acls.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.use.legacy.blockreader</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.read.shortcircuit</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.domain.socket.path</name>
    <value>/var/run/hdfs-sockets/dn</value>
  </property>
  <property>
    <name>dfs.client.read.shortcircuit.skip.checksum</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.domain.socket.data.traffic</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
    <value>true</value>
  </property>
</configuration>

6、hbase

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop1:8020</value>
  </property>
  <property>
    <name>fs.trash.interval</name>
    <value>1</value>
  </property>
  <property>
    <name>io.compression.codecs</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec</value>
  </property>
  <property>
    <name>hadoop.security.authentication</name>
    <value>simple</value>
  </property>
  <property>
    <name>hadoop.security.authorization</name>
    <value>false</value>
  </property>
  <property>
    <name>hadoop.rpc.protection</name>
    <value>authentication</value>
  </property>
  <property>
    <name>hadoop.security.auth_to_local</name>
    <value>DEFAULT</value>
  </property>
  <property>
    <name>hadoop.proxyuser.oozie.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.oozie.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.mapred.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.mapred.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.flume.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.flume.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.HTTP.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.HTTP.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hue.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hue.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.httpfs.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.httpfs.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hdfs.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hdfs.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.yarn.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.yarn.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.security.group.mapping</name>
    <value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value>
  </property>
  <property>
    <name>hadoop.security.instrumentation.requires.admin</name>
    <value>false</value>
  </property>
  <property>
    <name>hadoop.ssl.require.client.cert</name>
    <value>false</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.keystores.factory.class</name>
    <value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.server.conf</name>
    <value>ssl-server.xml</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.client.conf</name>
    <value>ssl-client.xml</value>
    <final>true</final>
  </property>
</configuration>

hbase-server-config.properties

#hbase  开发环境
need.init.hbase=true
# hbase.zookeeper.quorum=hadoop1.ultiwill.com,hadoop2.ultiwill.com,hadoop3.ultiwill.com
hbase.zookeeper.quorum=hadoop1,hadoop2,hadoop3
hbase.zookeeper.property.clientPort=2181
hbase.rpc.timeout=120000
hbase.client.scanner.timeout.period=120000

hbase-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://hadoop1:8020/hbase</value>
  </property>
  <property>
    <name>hbase.replication</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.client.write.buffer</name>
    <value>2097152</value>
  </property>
  <property>
    <name>hbase.client.pause</name>
    <value>100</value>
  </property>
  <property>
    <name>hbase.client.retries.number</name>
    <value>35</value>
  </property>
  <property>
    <name>hbase.client.scanner.caching</name>
    <value>100</value>
  </property>
  <property>
    <name>hbase.client.keyvalue.maxsize</name>
    <value>10485760</value>
  </property>
  <property>
    <name>hbase.ipc.client.allowsInterrupt</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.client.primaryCallTimeout.get</name>
    <value>10</value>
  </property>
  <property>
    <name>hbase.client.primaryCallTimeout.multiget</name>
    <value>10</value>
  </property>
  <property>
    <name>hbase.fs.tmp.dir</name>
    <value>/user/${user.name}/hbase-staging</value>
  </property>
  <property>
    <name>hbase.client.scanner.timeout.period</name>
    <value>60000</value>
  </property>
  <property>
    <name>hbase.coprocessor.region.classes</name>
    <value>org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint</value>
  </property>
  <property>
    <name>hbase.regionserver.thrift.http</name>
    <value>false</value>
  </property>
  <property>
    <name>hbase.thrift.support.proxyuser</name>
    <value>false</value>
  </property>
  <property>
    <name>hbase.rpc.timeout</name>
    <value>60000</value>
  </property>
  <property>
    <name>hbase.snapshot.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.snapshot.master.timeoutMillis</name>
    <value>60000</value>
  </property>
  <property>
    <name>hbase.snapshot.region.timeout</name>
    <value>60000</value>
  </property>
  <property>
    <name>hbase.snapshot.master.timeout.millis</name>
    <value>60000</value>
  </property>
  <property>
    <name>hbase.security.authentication</name>
    <value>simple</value>
  </property>
  <property>
    <name>hbase.rpc.protection</name>
    <value>authentication</value>
  </property>
  <property>
    <name>zookeeper.session.timeout</name>
    <value>60000</value>
  </property>
  <property>
    <name>zookeeper.znode.parent</name>
    <value>/hbase</value>
  </property>
  <property>
    <name>zookeeper.znode.rootserver</name>
    <value>root-region-server</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>hadoop1,hadoop3,hadoop2</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
  </property>
  <property>
    <name>hbase.rest.ssl.enabled</name>
    <value>false</value>
  </property>
</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>dfs.permissions</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///dfs/nn</value>
  </property>
  <property>
    <name>dfs.namenode.servicerpc-address</name>
    <value>hadoop1:8022</value>
  </property>
  <property>
    <name>dfs.https.address</name>
    <value>hadoop1:50470</value>
  </property>
  <property>
    <name>dfs.https.port</name>
    <value>50470</value>
  </property>
  <property>
    <name>dfs.namenode.http-address</name>
    <value>hadoop1:50070</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.blocksize</name>
    <value>134217728</value>
  </property>
  <property>
    <name>dfs.client.use.datanode.hostname</name>
    <value>false</value>
  </property>
  <property>
    <name>fs.permissions.umask-mode</name>
    <value>022</value>
  </property>
  <property>
    <name>dfs.namenode.acls.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.use.legacy.blockreader</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.read.shortcircuit</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.domain.socket.path</name>
    <value>/var/run/hdfs-sockets/dn</value>
  </property>
  <property>
    <name>dfs.client.read.shortcircuit.skip.checksum</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.domain.socket.data.traffic</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
    <value>true</value>
  </property>
</configuration>

7、kafka

kafka-data-push-info

--config                            kafka自动推送数据配置目录
--timeOut                           推送超时时间    默认 15 min  单位为分钟

kafka自动推送数据配置：
data.sources                        数据源列表。  （例如：data.sources =bhdb1,dpxx）  

{source}.source.type                某个数据源的类型。 （数据源分为数据库和文件两大类， 若为数据库 则使用 数据的名称 例如 oracle,mysql,sqlserver等， 否则使用 file）
                                                                                                            例如：bhdb1.source.type=oracle 或者  dpxx.source.type=file
数据源为数据库：
{source}.db.name                    数据库的名称
{source}.db.host                    数据库的ip或者主机名
{source}.db.port                    数据库的访问端口， 若不填写则使用该种数据库的默认端口
{source}.db.user                    用户名
{source}.db.pwd                     密码                                                                 
{source}.push.topic                 推送到topic的全局配置，即该数据库下配置的表没有配置topic的时候，其数据会推送到该topic。   
{source}.push.tables                需要推送数的表列表 
{source}.{table}.push.sql           只推送使用该sql查询到的数据    。       不填则表示推送全部。
{source}.{table}.push.adjusterfactory 对推送的数据进行调整  ， 必须为com.bh.d406.bigdata.kafka.producer.DataAdjuster的子类   ，  需要进行调整数据的时候填写
{source}.{table}.push.topic         该表的数据推送到topic名称  ， 若不填则使用全局的topic配置

数据源为文件：
{source}.file.dir                   文件目录    （注意：只支持本地目录 ）    
{source}.file.encoding              文件编码      （默认UTF-8）
{source}.file.extensions            需要过滤的文件格式列表
{source}.file.data.loaderfactory    文件加载器工厂类   
{source}.file.data.fields           记录的字段列表      与顺序有关
{source}.file.data.spliter          数据的分割符         默认 \t
{source}.file.skip.firstline        是否跳过第一行数据                       false  or true
{source}.file.data.adjusterfactory  数据矫正工厂类
{source}.push.thread.num            读取文件的线程数
{source}.push.batch.size            分批推送数据 ， 每批数据大小
{source}.push.topic                 数据推送的目标topic名称
{source}.store.table                存储的表名

kafka-server-config.properties

#################Kafka 全局配置 #######################
# 格式为host1:port1,host2:port2，
# 这是一个broker列表，用于获得元数据(topics，partitions和replicas)，建立起来的socket连接用于发送实际数据，
# 这个列表可以是broker的一个子集，或者一个VIP，指向broker的一个子集
# metadata.broker.list=hadoop1:9092,slaver01:9092,slaver02:9092
metadata.broker.list=hadoop1:9092

# zookeeper列表
zk.connect=hadoop1:2181,hadoop2:2181,hadoop3:2181

# 字消息的序列化类，默认是的encoder处理一个byte[]，返回一个byte[]
# 默认值为 kafka.serializer.DefaultEncoder
serializer.class=kafka.serializer.StringEncoder

# 用来控制一个produce请求怎样才能算完成，准确的说，是有多少broker必须已经提交数据到log文件，并向leader发送ack，可以设置如下的值：
# 0，意味着producer永远不会等待一个来自broker的ack，这就是0.7版本的行为。这个选项提供了最低的延迟，但是持久化的保证是最弱的，当server挂掉的时候会丢失一些数据。
# 1，意味着在leader replica已经接收到数据后，producer会得到一个ack。这个选项提供了更好的持久性，因为在server确认请求成功处理后，client才会返回。如果刚写到leader上，还没来得及复制leader就挂了，那么消息才可能会丢失。
# -1，意味着在所有的ISR都接收到数据后，producer才得到一个ack。这个选项提供了最好的持久性，只要还有一个replica存活，那么数据就不会丢失。
# 默认值  为 0
request.required.acks=1

# 请求超时时间     默认为 10000
request.timeout.ms=60000

#决定消息是否应在一个后台线程异步发送。
#合法的值为sync，表示异步发送；sync表示同步发送。
#设置为async则允许批量发送请求，这回带来更高的吞吐量，但是client的机器挂了的话会丢失还没有发送的数据。
#默认值为 sync
producer.type=sync

8、redis

redis.properties

redis.hostname = 192.168.116.202
redis.port  = 6379

9、spark

hive_fields_mapping.properties

datatype= base,wechat

#base = datatype,idcard,name,age,collecttime,imei
#wechat = datatype,wechat,phone,collecttime,imei
#============================================================base
base.datatype = string
base.idcard = string
base.name = string
base.age = long
base.collecttime = string
base.imei = string
#============================================================wechat
wechat.datatype = string
wechat.wechat = string
wechat.phone = string
wechat.collecttime = string
wechat.imei = string

relation.properties

#需要关联的字段
relationfield = phone_mac,phone,username,send_mail,imei,imsi

complex_relationfield = card,phone_mac,phone,username,send_mail,imei,imsi

spark-batch-config.properties

# spark 常规 配置   不包括 流式处理的 配置

#################### 全局  #############################
# 在用户没有指定时，用于分布式随机操作(groupByKey,reduceByKey等等)的默认的任务数（ shuffle过程中 task的个数 ）
# 默认为 8
spark.default.parallelism=16

# Spark用于缓存的内存大小所占用的Java堆的比率。这个不应该大于JVM中老年代所分配的内存大小
# 默认情况下老年代大小是堆大小的2/3，但是你可以通过配置你的老年代的大小，然后再去增加这个比率
# 默认为 0.66
# spark 1.6 后 过期
# spark.storage.memoryFraction=0.66

# 在spark1.6.0版本默认大小为： (“Java Heap” – 300MB) * 0.75
# 例如：如果堆内存大小有4G，将有2847MB的Spark Memory,Spark Memory=(4*1024MB-300)*0.75=2847MB
# 这部分内存会被分成两部分：Storage Memory和Execution Memory
# 而且这两部分的边界由spark.memory.storageFraction参数设定，默认是0.5即50%
# 新的内存管理模型中的优点是，这个边界不是固定的，在内存压力下这个边界是可以移动的
# 如一个区域内存不够用时可以从另一区域借用内存
spark.memory.fraction=0.75
spark.memory.storageFraction=0.5

# 是否要压缩序列化的RDD分区（比如，StorageLevel.MEMORY_ONLY_SER）
# 在消耗一点额外的CPU时间的代价下，可以极大的提高减少空间的使用
# 默认为 false
spark.rdd.compress=true

# The codec used to compress internal data such as RDD partitions,
# broadcast variables and shuffle outputs. By default,
# Spark provides three codecs: lz4, lzf, and snappy. You can also use fully qualified class names to specify the codec,
# e.g.
# 1. org.apache.spark.io.LZ4CompressionCodec, 
# 2. org.apache.spark.io.LZFCompressionCodec, 
# 3. org.apache.spark.io.SnappyCompressionCodec.   default
spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec

# Block size (in bytes) used in Snappy compression,
# in the case when Snappy compression codec is used.
# Lowering this block size will also lower shuffle memory usage when Snappy is used.
# default : 32K
spark.io.compression.snappy.blockSize=32768


# 同时获取每一个分解任务的时候，映射输出文件的最大的尺寸（以兆为单位）。
# 由于对每个输出都需要我们去创建一个缓冲区去接受它，这个属性值代表了对每个分解任务所使用的内存的一个上限值，
# 因此除非你机器内存很大，最好还是配置一下这个值。
# 默认48
spark.reducer.maxSizeInFlight=48

# 这个配置参数仅适用于HashShuffleMananger的实现，同样是为了解决生成过多文件的问题，
# 采用的方式是在不同批次运行的Map任务之间重用Shuffle输出文件，也就是说合并的是不同批次的Map任务的输出数据，
# 但是每个Map任务所需要的文件还是取决于Reduce分区的数量，因此，它并不减少同时打开的输出文件的数量，
# 因此对内存使用量的减少并没有帮助。只是HashShuffleManager里的一个折中的解决方案。
# 默认为false
#spark.shuffle.consolidateFiles=false

#java.io.Externalizable. Java serialization is flexible but often quite slow, and leads to large serialized formats for many classes.
#default java.io.Serializable
#spark.serializer=org.apache.spark.serializer.KryoSerializer

# Speculation是在任务调度的时候，如果没有适合当前本地性要求的任务可供运行，
# 将跑得慢的任务在空闲计算资源上再度调度的行为，这些参数调整这些行为的频率和判断指标，默认是不使用Speculation的
# 默认为false
# 慎用   可能导致数据重复的现象
#spark.speculation=true

# task失败重试次数
# 默认为4
spark.task.maxFailures=8

# Spark 是有任务的黑名单机制的，但是这个配置在官方文档里面并没有写，可以设置下面的参数，
# 比如设置成一分钟之内不要再把任务发到这个 Executor 上了，单位是毫秒。
# spark.scheduler.executorTaskBlacklistTime=60000

# 超过这个时间，可以执行 NODE_LOCAL 的任务
# 默认为 3000
spark.locality.wait.process=1

# 超过这个时间，可以执行 RACK_LOCAL 的任务
# 默认为 3000
spark.locality.wait.node=3 

# 超过这个时间，可以执行 ANY 的任务
# 默认为 3000
spark.locality.wait.rack=1000

#################### yarn  ###########################

# 提交的jar文件  的副本数
# 默认为 3
spark.yarn.submit.file.replication=1

# container中的线程数
# 默认为 25
spark.yarn.containerLauncherMaxThreads=25

# 解决yarn-cluster模式下 对处理  permGen space oom异常很有用
# spark.yarn.am.extraJavaOptions=
# spark.driver.extraJavaOptions=-XX:PermSize=512M -XX:MaxPermSize=1024M

# 对象指针压缩 和 gc日志收集打印
# spark.executor.extraJavaOptions=-XX:PermSize=512M -XX:MaxPermSize=1024M -XX:MaxDirectMemorySize=1536M -XX:+UseCompressedOops -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
# -XX:-UseGCOverheadLimit
# GC默认情况下有一个限制，默认是GC时间不能超过2%的CPU时间，但是如果大量对象创建（在Spark里很容易出现，代码模式就是一个RDD转下一个RDD），
# 就会导致大量的GC时间，从而出现OutOfMemoryError: GC overhead limit exceeded，可以通过设置-XX:-UseGCOverheadLimit关掉它。
# -XX:+UseCompressedOops  可以压缩指针（8字节变成4字节）
spark.executor.extraJavaOptions=-XX:PermSize=512M -XX:MaxPermSize=1024m -XX:+CMSClassUnloadingEnabled -Xmn512m -XX:MaxTenuringThreshold=15 -XX:-UseGCOverheadLimit -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCompressedOops -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log -XX:+HeapDumpOnOutOfMemoryError

# 当shuffle缓存的数据超过此值  强制刷磁盘  单位为 byte
# spark.shuffle.spill.initialMemoryThreshold=671088640

################### AKKA 相关 ##########################

# 在控制面板通信（序列化任务和任务结果）的时候消息尺寸的最大值，单位是MB。
# 如果你需要给驱动器发回大尺寸的结果（比如使用在一个大的数据集上面使用collect()方法），那么你就该增加这个值了。
# 默认为 10
spark.akka.frameSize=1024

# 用于通信的actor线程数量。如果驱动器有很多CPU核心，那么在大集群上可以增大这个值。
# 默认为 4
spark.akka.threads=8

# Spark节点之间通信的超时时间，以秒为单位
# 默认为20s
spark.akka.timeout=120

# exector的堆外内存（不会占用 分配给executor的jvm内存）
# spark.yarn.executor.memoryOverhead=2560

spark-start-config.properties

# Spark 任务 使用java -cp 方式启动的参数配置
#
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/lib/native
spark.yarn.jar=local:/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/spark/lib/spark-assembly.jar
spark.authenticate=false
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/lib/native
spark.yarn.historyServer.address=http://BH-LAN-Virtual-hadoop-9:18088
spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/lib/native
spark.eventLog.enabled=true
spark.dynamicAllocation.schedulerBacklogTimeout=1
SPARK_SUBMIT=true
spark.yarn.config.gatewayPath=/opt/cloudera/parcels
spark.ui.killEnabled=true
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled=true
spark.dynamicAllocation.minExecutors=0
spark.dynamicAllocation.executorIdleTimeout=60
spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..
spark.shuffle.service.port=7337
spark.eventLog.dir=hdfs://nameservice1/user/spark/applicationHistory
spark.dynamicAllocation.enabled=true

#/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/spark/lib/*
#/etc/spark/conf.cloudera.spark_on_yarn/
#/etc/hadoop/conf.cloudera.yarn/

spark.submit.deployMode=client
spark.app.name=default
spark.master=yarn-client
spark.driver.memory=1g
spark.executor.instances=1
spark.executor.memory=4g
spark.executor.cores=2
spark.jars=

spark-streaming-config.properties

# spark  流式处理的 配置

# job的并行度
# 默认为 1
spark.streaming.concurrentJobs=1

# Spark记忆任何元数据(stages生成，任务生成等等)的时间(秒)。周期性清除保证在这个时间之前的元数据会被遗忘。
#当长时间几小时，几天的运行Spark的时候设置这个是很有用的。注意：任何内存中的RDD只要过了这个时间就会被清除掉。
# 默认 disable
spark.cleaner.ttl=3600

# 将不再使用的缓存数据清除
# 默认为false
spark.streaming.unpersist=true

# 从网络中批量接受对象时的持续时间 , 单位  ms。
# 默认为200ms
spark.streaming.blockInterval=200

# 控制Receiver速度  单位 s
# 因为当streaming程序的数据源的数据量突然变大巨大，可能会导致streaming被撑住导致吞吐不过来，所以可以考虑对于最大吞吐做一下限制。
# 默认为 100000
spark.streaming.receiver.maxRate=10000

# kafka每个分区最大的读取速度   单位 s
# 控制kafka读取的量
spark.streaming.kafka.maxRatePerPartition=50

# 读取kafka的分区最新offset的最大尝试次数
# 默认为1
spark.streaming.kafka.maxRetries=5

# 1、为什么引入Backpressure
# 默认情况下，Spark Streaming通过Receiver以生产者生产数据的速率接收数据，计算过程中会出现batch processing time > batch interval的情况，
# 其中batch processing time 为实际计算一个批次花费时间， batch interval为Streaming应用设置的批处理间隔。
# 这意味着Spark Streaming的数据接收速率高于Spark从队列中移除数据的速率，也就是数据处理能力低，在设置间隔内不能完全处理当前接收速率接收的数据。
# 如果这种情况持续过长的时间，会造成数据在内存中堆积，导致Receiver所在Executor内存溢出等问题（如果设置StorageLevel包含disk, 则内存存放不下的数据会溢写至disk, 加大延迟）。
# Spark 1.5以前版本，用户如果要限制Receiver的数据接收速率，可以通过设置静态配制参数“spark.streaming.receiver.maxRate”的值来实现，
# 此举虽然可以通过限制接收速率，来适配当前的处理能力，防止内存溢出，但也会引入其它问题。比如：producer数据生产高于maxRate，当前集群处理能力也高于maxRate，这就会造成资源利用率下降等问题。
# 为了更好的协调数据接收速率与资源处理能力，Spark Streaming 从v1.5开始引入反压机制（back-pressure）,通过动态控制数据接收速率来适配集群数据处理能力。
# 2、Backpressure
# Spark Streaming Backpressure:  根据JobScheduler反馈作业的执行信息来动态调整Receiver数据接收率。
# 通过属性“spark.streaming.backpressure.enabled”来控制是否启用backpressure机制，默认值false，即不启用
spark.streaming.backpressure.enabled=true
spark.streaming.backpressure.initialRate=200

datatype/fieldtype.properties

hive/hive-server-config.properties

# hbase  开发环境

hive/hive-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://hadoop1:9083</value>
  </property>
  <property>
    <name>hive.metastore.client.socket.timeout</name>
    <value>300</value>
  </property>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
  </property>
  <property>
    <name>hive.warehouse.subdir.inherit.perms</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.auto.convert.join</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.auto.convert.join.noconditionaltask.size</name>
    <value>20971520</value>
  </property>
  <property>
    <name>hive.optimize.bucketmapjoin.sortedmerge</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.smbjoin.cache.rows</name>
    <value>10000</value>
  </property>
  <property>
    <name>hive.server2.logging.operation.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/hadoop_log/log/hive/operation_logs</value>
  </property>
  <property>
    <name>mapred.reduce.tasks</name>
    <value>-1</value>
  </property>
  <property>
    <name>hive.exec.reducers.bytes.per.reducer</name>
    <value>67108864</value>
  </property>
  <property>
    <name>hive.exec.copyfile.maxsize</name>
    <value>33554432</value>
  </property>
  <property>
    <name>hive.exec.reducers.max</name>
    <value>1099</value>
  </property>
  <property>
    <name>hive.vectorized.groupby.checkinterval</name>
    <value>4096</value>
  </property>
  <property>
    <name>hive.vectorized.groupby.flush.percent</name>
    <value>0.1</value>
  </property>
  <property>
    <name>hive.compute.query.using.stats</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.vectorized.execution.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.vectorized.execution.reduce.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.merge.mapfiles</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.merge.mapredfiles</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.cbo.enable</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.fetch.task.conversion</name>
    <value>minimal</value>
  </property>
  <property>
    <name>hive.fetch.task.conversion.threshold</name>
    <value>268435456</value>
  </property>
  <property>
    <name>hive.limit.pushdown.memory.usage</name>
    <value>0.1</value>
  </property>
  <property>
    <name>hive.merge.sparkfiles</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.merge.smallfiles.avgsize</name>
    <value>16777216</value>
  </property>
  <property>
    <name>hive.merge.size.per.task</name>
    <value>268435456</value>
  </property>
  <property>
    <name>hive.optimize.reducededuplication</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.optimize.reducededuplication.min.reducer</name>
    <value>4</value>
  </property>
  <property>
    <name>hive.map.aggr</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.map.aggr.hash.percentmemory</name>
    <value>0.5</value>
  </property>
  <property>
    <name>hive.optimize.sort.dynamic.partition</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.execution.engine</name>
    <value>mr</value>
  </property>
  <property>
    <name>spark.executor.memory</name>
    <value>1369020825</value>
  </property>
  <property>
    <name>spark.driver.memory</name>
    <value>966367641</value>
  </property>
  <property>
    <name>spark.executor.cores</name>
    <value>1</value>
  </property>
  <property>
    <name>spark.yarn.driver.memoryOverhead</name>
    <value>102</value>
  </property>
  <property>
    <name>spark.yarn.executor.memoryOverhead</name>
    <value>230</value>
  </property>
  <property>
    <name>spark.dynamicAllocation.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>spark.dynamicAllocation.initialExecutors</name>
    <value>1</value>
  </property>
  <property>
    <name>spark.dynamicAllocation.minExecutors</name>
    <value>1</value>
  </property>
  <property>
    <name>spark.dynamicAllocation.maxExecutors</name>
    <value>2147483647</value>
  </property>
  <property>
    <name>hive.metastore.execute.setugi</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.support.concurrency</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.zookeeper.quorum</name>
    <value>hadoop1,hadoop3,hadoop2</value>
  </property>
  <property>
    <name>hive.zookeeper.client.port</name>
    <value>2181</value>
  </property>
  <property>
    <name>hive.zookeeper.namespace</name>
    <value>hive_zookeeper_namespace_hive</value>
  </property>
  <property>
    <name>hive.cluster.delegation.token.store.class</name>
    <value>org.apache.hadoop.hive.thrift.MemoryTokenStore</value>
  </property>
  <property>
    <name>hive.server2.enable.doAs</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.server2.use.SSL</name>
    <value>false</value>
  </property>
  <property>
    <name>spark.shuffle.service.enabled</name>
    <value>true</value>
  </property>
</configuration>

五．Flume开发

xz_bigdata_flume

FTP–>FlumeSource–>拦截器–>FlumeChannel–>FlumeSink–>Kafka

自定义的内容有：FlumeSource、拦截器、FlumeSink

1、maven冲突解决和pom.xml

1.1 安装Maven Helper插件，在Settings里面的Plugins里面搜索Maven Helper，点击Install，安装完毕。

1.2 ETL包括数据的抽取、转换、加载
①数据抽取：从源数据源系统抽取目的数据源系统需要的数据：
②数据转换：将从源数据源获取的数据按照业务需求，转换成目的数据源要求的形式，并对错误、不一致的数据进行清洗和加工；
③数据加载：将转换后的数据装载到目的数据源。

Flume数据处理流程

1.3 pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata2</artifactId>
        <groupId>com.hsiehchou</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_flume</artifactId>

    <name>xz_bigdata_flume</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <flume-ng.version>1.6.0</flume-ng.version>
        <hadoop.version>2.6.0</hadoop.version>
        <jdom.version>1.0</jdom.version>
        <c3p0.version>0.9.5</c3p0.version>
        <hadoop.version>2.6.0</hadoop.version>
        <mybatis.version>3.1.1</mybatis.version>
        <zookeeper.version>3.4.6</zookeeper.version>
        <net.sf.json.version>2.2.3</net.sf.json.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_resources</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_common</artifactId>
            <version>1.0-SNAPSHOT</version>
            <exclusions>
                <exclusion>
                    <artifactId>fastjson</artifactId>
                    <groupId>com.alibaba</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-configuration</artifactId>
                    <groupId>commons-configuration</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-io</artifactId>
                    <groupId>commons-io</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-lang</artifactId>
                    <groupId>commons-lang</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_kafka</artifactId>
            <version>1.0-SNAPSHOT</version>
            <exclusions>
                <exclusion>
                    <artifactId>snappy-java</artifactId>
                    <groupId>org.xerial.snappy</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>scala-library</artifactId>
                    <groupId>org.scala-lang</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>zookeeper</artifactId>
                    <groupId>org.apache.zookeeper</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>slf4j-api</artifactId>
                    <groupId>org.slf4j</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>slf4j-log4j12</artifactId>
                    <groupId>org.slf4j</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>log4j</artifactId>
                    <groupId>log4j</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <!--flume核心依赖-->
        <dependency>
            <groupId>org.apache.flume</groupId>
            <artifactId>flume-ng-core</artifactId>
            <version>${flume-ng.version}-${cdh.version}</version>
            <exclusions>
                <exclusion>
                    <artifactId>slf4j-api</artifactId>
                    <groupId>org.slf4j</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>guava</artifactId>
                    <groupId>com.google.guava</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-codec</artifactId>
                    <groupId>commons-codec</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-logging</artifactId>
                    <groupId>commons-logging</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>jetty</artifactId>
                    <groupId>org.mortbay.jetty</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>jetty-util</artifactId>
                    <groupId>org.mortbay.jetty</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>servlet-api</artifactId>
                    <groupId>org.mortbay.jetty</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-io</artifactId>
                    <groupId>commons-io</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-lang</artifactId>
                    <groupId>commons-lang</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.apache.flume</groupId>
            <artifactId>flume-ng-sdk</artifactId>
            <version>${flume-ng.version}-${cdh.version}</version>
        </dependency>

        <!--flume配置依赖-->
        <dependency>
            <groupId>org.apache.flume</groupId>
            <artifactId>flume-ng-configuration</artifactId>
            <version>${flume-ng.version}-${cdh.version}</version>
            <exclusions>
                <exclusion>
                    <artifactId>guava</artifactId>
                    <groupId>com.google.guava</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>jdom</groupId>
            <artifactId>jdom</artifactId>
            <version>${jdom.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-api</artifactId>
            <version>RELEASE</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>RELEASE</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>RELEASE</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>commons-lang</groupId>
            <artifactId>commons-lang</artifactId>
            <version>RELEASE</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>commons-configuration</groupId>
            <artifactId>commons-configuration</artifactId>
            <version>RELEASE</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>RELEASE</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>RELEASE</version>
            <scope>compile</scope>
        </dependency>
    </dependencies>

    <build>
        <defaultGoal>compile</defaultGoal>
        <sourceDirectory>src/main/java/</sourceDirectory>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <addClasspath>true</addClasspath>
                            <classpathPrefix>jars/</classpathPrefix>
                            <mainClass></mainClass>
                        </manifest>
                    </archive>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <executions>
                    <execution>
                        <id>copy</id>
                        <phase>install</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                        <configuration>
                            <outputDirectory>
                                ${project.build.directory}/jars
                            </outputDirectory>
                            <excludeArtifactIds>javaee-api</excludeArtifactIds>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-resources-plugin</artifactId>
                <version>2.7</version>
                <configuration>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>

2、自定义source

2.1 继承AbstractSource 实现 Configurable, PollableSource接口

package com.hsiehchou.flume.source;

import com.hsiehchou.flume.constant.FlumeConfConstant;
import com.hsiehchou.flume.fields.MapFields;import com.hsiehchou.flume.utils.FileUtilsStronger;
import org.apache.commons.io.FileUtils;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.PollableSource;
import org.apache.flume.channel.ChannelProcessor;
import org.apache.flume.conf.Configurable;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.source.AbstractSource;
import org.apache.log4j.Logger;

import java.io.File;
import java.util.*;

/**
 * 固定写法，自定义Source 直接继承 AbstractSource 和 实现 Configurable, PollableSource 接口
 * 可参照官网 http://flume.apache.org/releases/content/1.9.0/FlumeDeveloperGuide.html#source
 */
public class FolderSource extends AbstractSource implements Configurable, PollableSource {

    private final Logger logger = Logger.getLogger(FolderSource.class);

    //tier1.sources.source1.sleeptime=5
    //tier1.sources.source1.filenum=3000
    //tier1.sources.source1.dirs =/usr/chl/data/filedir/
    //tier1.sources.source1.successfile=/usr/chl/data/filedir_successful/

    //以下为配置在flume.conf文件中
    //读取的文件目录
    private String dirStr;
    //读取的文件目录，如果多个，以","分割，在flume.conf里面配置
    private String[] dirs;
    //处理成功的文件写入的目录
    private String successfile;
    //睡眠时间
    private long sleeptime = 5;
    //每批文件数量
    private int filenum = 500;

    //以下为配置在txtparse.properties文件中
    //读取的所有文件集合
    private Collection<File> allFiles;

    //一批处理的文件大小
    private List<File> listFiles;
    private ArrayList<Event> eventList = new ArrayList<Event>();

    /**
     * @param context 拿到flume配置里面的所有参数
     */
    @Override
    public void configure(Context context) {
        logger.info("开始初始化flume参数");
        initFlumeParams(context);
        logger.info("初始化flume参数成功");
    }

    @Override
    public Status process() {
        //定义处理逻辑
        try {
            Thread.currentThread().sleep(sleeptime * 1000);
        } catch (InterruptedException e) {
            logger.error(null, e);
        }

        Status status = null;
        try {
            // for (String dir : dirs) {
            logger.info("dirStr===========" + dirStr);


            //TODO 1.监控目录下面的所有文件
            //读取目录下的文件，获取目录下所有以 "txt", "bcp" 结尾的文件
            allFiles = FileUtils.listFiles(new File(dirStr), new String[]{"txt", "bcp"}, true);

            //如果目录下文件总数大于阈值，则只取 filenum 个文件进行处理
            if (allFiles.size() >= filenum) {
                //文件数量大于3000 只取3000条
                listFiles = ((List<File>) allFiles).subList(0, filenum);
            } else {
                //文件数量小于3000，取所有文件进行处理
                listFiles = ((List<File>) allFiles);
            }

            //TODO 2.遍历所有的文件进行解析
            if (listFiles.size() > 0) {

                for (File file : listFiles) {
                    //文件名是需要传到channel中的
                    String fileName = file.getName();

                    //解析文件  获取文件名及文件内容 文件绝对路径  文件内容
                    Map<String, Object> stringObjectMap = FileUtilsStronger.parseFile(file, successfile);

                    //返回的内容2个参数  一个是文件绝对路径  另一个是lines文件的所有内容

                    //获取文件绝对路径
                    String absoluteFilename = (String) stringObjectMap.get(MapFields.ABSOLUTE_FILENAME);

                    //获取文件内容
                    List<String> lines = (List<String>) stringObjectMap.get(MapFields.VALUE);

                    //TODO 解析出来之后，需要把解析出来的数据封装为Event
                    if (lines != null && lines.size() > 0) {

                        //遍历读取的内容
                        for (String line : lines) {

                            //封装event Header 将文件名及文件绝对路径通过header传送到channel中
                            //构建event头
                            Map<String, String> map = new HashMap<String, String>();

                            //文件名
                            map.put(MapFields.FILENAME, fileName);

                            //文件绝对路径
                            map.put(MapFields.ABSOLUTE_FILENAME, absoluteFilename);

                            //构建event
                            SimpleEvent event = new SimpleEvent();

                            //把读取的一行数据转成字节
                            byte[] bytes = line.getBytes();
                            event.setBody(bytes);
                            event.setHeaders(map);
                            eventList.add(event);
                        }
                    }

                    try {
                        if (eventList.size() > 0) {
                            //获取channelProcessor
                            ChannelProcessor channelProcessor = getChannelProcessor();

                            //通过channelProcessor把eventList发送出去，可以通过拦截器进行拦截
                            channelProcessor.processEventBatch(eventList);
                            logger.info("批量推送到 拦截器 数据大小为" + eventList.size());
                        }
                        eventList.clear();
                    } catch (Exception e) {
                        eventList.clear();
                        logger.error("发送数据到channel失败", e);
                    } finally {
                        eventList.clear();
                    }
                }
            }
            // 处理成功，返回成功状态
            status = Status.READY;
            return status;
        } catch (Exception e) {
            status = Status.BACKOFF;
            logger.error("异常", e);
            return status;
        }
    }

    /**
     * 初始化flume參數
     * @param context
     */
    public void initFlumeParams(Context context) {

        //读取flume，conf配置文件，初始化参数
        try {
            //文件处理目录

            //监控的文件目录
            dirStr = context.getString(FlumeConfConstant.DIRS);

            //监控多个目录
            dirs = dirStr.split(",");

            //成功处理的文件存放目录
            successfile = context.getString(FlumeConfConstant.SUCCESSFILE);

            //每批处理文件个数
            filenum = context.getInteger(FlumeConfConstant.FILENUM);

            //睡眠时间
            sleeptime = context.getLong(FlumeConfConstant.SLEEPTIME);

            logger.info("dirStr============" + dirStr);
            logger.info("dirs==============" + dirs);
            logger.info("successfile=======" + successfile);
            logger.info("filenum===========" + filenum);
            logger.info("sleeptime=========" + sleeptime);

        } catch (Exception e) {
            logger.error("初始化flume参数失败", e);
        }
    }

    @Override
    public long getBackOffSleepIncrement() {
        return 0;
    }

    @Override
    public long getMaxBackOffSleepInterval() {
        return 0;
    }
}

2.2 实现process()方法
此处代码已经在2.1里面，不用再写了

 public Status process() {
        //定义处理逻辑
        try {
            Thread.currentThread().sleep(sleeptime * 1000);
        } catch (InterruptedException e) {
            logger.error(null, e);
        }

        Status status = null;
        try {
            // for (String dir : dirs) {
            logger.info("dirStr===========" + dirStr);


            //TODO 1.监控目录下面的所有文件
            //读取目录下的文件，获取目录下所有以 "txt", "bcp" 结尾的文件
            allFiles = FileUtils.listFiles(new File(dirStr), new String[]{"txt", "bcp"}, true);

            //如果目录下文件总数大于阈值，则只取 filenum 个文件进行处理
            if (allFiles.size() >= filenum) {
                //文件数量大于3000 只取3000条
                listFiles = ((List<File>) allFiles).subList(0, filenum);
            } else {
                //文件数量小于3000，取所有文件进行处理
                listFiles = ((List<File>) allFiles);
            }

            //TODO 2.遍历所有的文件进行解析
            if (listFiles.size() > 0) {

                for (File file : listFiles) {
                    //文件名是需要传到channel中的
                    String fileName = file.getName();

                    //解析文件  获取文件名及文件内容 文件绝对路径  文件内容
                    Map<String, Object> stringObjectMap = FileUtilsStronger.parseFile(file, successfile);

                    //返回的内容2个参数  一个是文件绝对路径  另一个是lines文件的所有内容

                    //获取文件绝对路径
                    String absoluteFilename = (String) stringObjectMap.get(MapFields.ABSOLUTE_FILENAME);

                    //获取文件内容
                    List<String> lines = (List<String>) stringObjectMap.get(MapFields.VALUE);

                    //TODO 解析出来之后，需要把解析出来的数据封装为Event
                    if (lines != null && lines.size() > 0) {

                        //遍历读取的内容
                        for (String line : lines) {

                            //封装event Header 将文件名及文件绝对路径通过header传送到channel中
                            //构建event头
                            Map<String, String> map = new HashMap<String, String>();

                            //文件名
                            map.put(MapFields.FILENAME, fileName);

                            //文件绝对路径
                            map.put(MapFields.ABSOLUTE_FILENAME, absoluteFilename);

                            //构建event
                            SimpleEvent event = new SimpleEvent();

                            //把读取的一行数据转成字节
                            byte[] bytes = line.getBytes();
                            event.setBody(bytes);
                            event.setHeaders(map);
                            eventList.add(event);
                        }
                    }

                    try {
                        if (eventList.size() > 0) {
                            //获取channelProcessor
                            ChannelProcessor channelProcessor = getChannelProcessor();

                            //通过channelProcessor把eventList发送出去，可以通过拦截器进行拦截
                            channelProcessor.processEventBatch(eventList);
                            logger.info("批量推送到 拦截器 数据大小为" + eventList.size());
                        }
                        eventList.clear();
                    } catch (Exception e) {
                        eventList.clear();
                        logger.error("发送数据到channel失败", e);
                    } finally {
                        eventList.clear();
                    }
                }
            }
            // 处理成功，返回成功状态
            status = Status.READY;
            return status;
        } catch (Exception e) {
            status = Status.BACKOFF;
            logger.error("异常", e);
            return status;
        }
    }

source/MySource.java—Flume官网上的案例

package com.hsiehchou.flume.source;

import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.PollableSource;
import org.apache.flume.conf.Configurable;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.source.AbstractSource;

public class MySource extends AbstractSource implements Configurable, PollableSource {
    private String myProp;

    /**
     * 配置读取
     * @param context
     */
    @Override
    public void configure(Context context) {
        String myProp = context.getString("myProp", "defaultValue");
        // Process the myProp value (e.g. validation, convert to another type, ...)
        // Store myProp for later retrieval by process() method
        this.myProp = myProp;
    }

    /**
     * 定义自己的业务逻辑
     * @return
     * @throws EventDeliveryException
     */
    @Override
    public Status process() throws EventDeliveryException {
        Status status = null;
        try {
            // This try clause includes whatever Channel/Event operations you want to do
            // Receive new data
            //需要把自己的数据封装为event进行传输
            Event e = new SimpleEvent();

            // Store the Event into this Source's associated Channel(s)
            getChannelProcessor().processEvent(e);
            status = Status.READY;
        } catch (Throwable t) {
            // Log exception, handle individual exceptions as needed
            status = Status.BACKOFF;
            // re-throw all Errors
            if (t instanceof Error) {
                throw (Error)t;
            }
        } finally {

        }
        return status;
    }

    @Override
    public long getBackOffSleepIncrement() {
        return 0;
    }

    @Override
    public long getMaxBackOffSleepInterval() {
        return 0;
    }

    @Override
    public void start() {
        // Initialize the connection to the external client
    }

    @Override
    public void stop () {
        // Disconnect from external client and do any additional cleanup
        // (e.g. releasing resources or nulling-out field values) ..
    }
}

3、自定义interceptor—数据清洗过滤器

3.1实现Interceptor 接口

package com.hsiehchou.flume.interceptor;

import com.alibaba.fastjson.JSON;
import com.hsiehchou.flume.fields.MapFields;
import com.hsiehchou.flume.service.DataCheck;
import org.apache.commons.io.Charsets;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.interceptor.Interceptor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;

/**
 * 数据清洗过滤器
 */
public class DataCleanInterceptor implements Interceptor {

    private static final Logger LOG = LoggerFactory.getLogger(DataCleanInterceptor.class);

    //datatpye.properties
    //private static Map<String,ArrayList<String>> dataMap = DataTypeProperties.dataTypeMap;

    /**
     *  初始化
     */
    @Override
    public void initialize() {
    }

    /**
     * 单条处理
     * 拦截方法。数据解析，封装，数据清洗
     * @param event
     * @return
     */
    @Override
    public Event intercept(Event event) {
        SimpleEvent eventNew = new SimpleEvent();
        try {
            LOG.info("拦截器Event开始执行");
            Map<String, String> map = parseEvent(event);
            if(map == null){
                return null;
            }
            String lineJson = JSON.toJSONString(map);
            LOG.info("拦截器推送数据到channel:" +lineJson);
            eventNew.setBody(lineJson.getBytes());
        } catch (Exception e) {
            LOG.error(null,e);
        }
        return eventNew;
    }

    /**
     * 批处理
     * @param events
     * @return
     */
    @Override
    public List<Event> intercept(List<Event> events) {
        List<Event> list = new ArrayList<Event>();
        for (Event event : events) {
            Event intercept = intercept(event);
            if (intercept != null) {
                list.add(intercept);
            }
        }
        return list;
    }

    @Override
    public void close() {
    }

    /**
     * 数据解析
     * @param event
     * @return
     */
    public static Map<String,String> parseEvent(Event event){
        if (event == null) {
            return null;
        }

        //000000000000000    000000000000000    24.000000    25.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305985    andiy    18609765432    judy            1789098763
        String line = new String(event.getBody(), Charsets.UTF_8);

        //文件名 和 文件绝对路径
        String filename = event.getHeaders().get(MapFields.FILENAME);
        String absoluteFilename = event.getHeaders().get(MapFields.ABSOLUTE_FILENAME);

        //String转map，进行数据校验，检验错误入ES错误表
        Map<String, String> map = DataCheck.txtParseAndalidation(line,filename,absoluteFilename);
        return map;

        //wechat_source1_1111115.txt
        //String[] fileNames = filename.split("_");

        // String转map，并进行数据长度校验，校验错误入ES错误表
        //Map<String, String> map = JZDataCheck.txtParse(type, line, source, filename,absoluteFilename);
        //Map<String,String> map = new HashMap<>();

        //000000000000000    000000000000000    24.000000    25.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305985    andiy    18609765432    judy            1789098763
        //String[] split = line.split("\t");

        //数据类别
        //String dataType = fileNames[0];

        //imei,imsi,longitude,latitude,phone_mac,device_mac,device_number,collect_time,username,phone,object_username,send_message,accept_message,message_time
        //ArrayList<String> fields = dataMap.get(dataType);
        //for (int i = 0; i < split.length; i++) {
        //    map.put(fields.get(i),split[i]);
        //}

        //添加ID
        //map.put(MapFields.ID, UUID.randomUUID().toString().replace("-",""));
        // map.put(MapFields.TABLE, dataType);
        // map.put(MapFields.FILENAME, filename);
        // map.put(MapFields.ABSOLUTE_FILENAME, absoluteFilename);

//        Map<String, String> map = DataCheck.txtParseAndalidation(line,filename,absoluteFilename);
//        return map;
    }

    /**
     * 实例化创建
     */
    public static class Builder implements Interceptor.Builder {
        @Override
        public void configure(Context context) {
        }
        @Override
        public Interceptor build() {
            return new DataCleanInterceptor();
        }
    }
}

4、utils工具类

utils/FileUtilsStronger.java

package com.hsiehchou.flume.utils;

import com.hsiehchou.common.time.TimeTranstationUtils;
import com.hsiehchou.flume.fields.MapFields;
import org.apache.commons.io.FileUtils;
import org.apache.log4j.Logger;

import java.io.File;
import java.util.*;

import static java.io.File.separator;

public class FileUtilsStronger {

    private static final Logger logger = Logger.getLogger(FileUtilsStronger.class);

    /**
     * @param file
     * @param path
     */
    public static Map<String,Object> parseFile(File file, String path) {

        Map<String,Object> map=new HashMap<String,Object>();
        List<String> lines;
        String fileNew = path+ TimeTranstationUtils.Date2yyyy_MM_dd()+getDir(file);

        try {
            if((new File(fileNew+file.getName())).exists()){
                try{
                    logger.info("文件名已经存在，开始删除同名已经存在文件"+file.getAbsolutePath());
                    file.delete();
                    logger.info("删除同名已经存在文件"+file.getAbsolutePath()+"成功");
                }catch (Exception e){
                    logger.error("删除同名已经存在文件"+file.getAbsolutePath()+"失败",e);
                }
            }else{
                lines = FileUtils.readLines(file);
                map.put(MapFields.ABSOLUTE_FILENAME,fileNew+file.getName());
                map.put(MapFields.VALUE,lines);
                FileUtils.moveToDirectory(file, new File(fileNew), true);
                logger.info("移动文件到"+file.getAbsolutePath()+"到"+fileNew+"成功");
            }
        } catch (Exception e) {
            logger.error("移动文件" + file.getAbsolutePath() + "到" + fileNew + "失败", e);
        }
        return map;
    }

    /**
     * @param file
     * @param path
     */
    public static List<String> chanmodName(File file, String path) {

        List<String> lines=null;
        try {
            if((new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"/"+file.getName())).exists()){
                logger.warn("文件名已经存在，开始删除同名文件" +path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"/"+file.getName());
                try{
                    file.delete();
                    logger.warn("删除同名文件"+file.getAbsolutePath()+"成功");
                }catch (Exception e){
                    logger.warn("删除同名文件"+file.getAbsolutePath()+"失败",e);
                }
            }else{
                lines = FileUtils.readLines(file);
                FileUtils.moveToDirectory(file, new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()), true);
                logger.info("移动文件到"+file.getAbsolutePath()+"到"+path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"成功");
            }
        } catch (Exception e) {
            logger.error("移动文件" + file.getName() + "到" + path+ TimeTranstationUtils.Date2yyyy_MM_dd() + "失败", e);
        }
        return lines;
    }

    /**
     * @param file
     * @param path
     */
    public static void moveFile2unmanage(File file, String path) {

        try {
            if((new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"/"+file.getName())).exists()){
                logger.warn("文件名已经存在，开始删除同名文件" +file.getAbsolutePath());
                try{
                    file.delete();
                    logger.warn("删除同名文件"+file.getAbsolutePath()+"成功");
                }catch (Exception e){
                    logger.warn("删除同名文件"+file.getAbsolutePath()+"失败",e);
                }
            }else{
                FileUtils.moveToDirectory(file, new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()), true);
                //logger.info("移动文件到"+file.getAbsolutePath()+"到"+path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"成功");
            }
        } catch (Exception e) {
            logger.error("移动错误文件" + file.getName() + "到" + path+ TimeTranstationUtils.Date2yyyy_MM_dd() + "失败", e);
        }
    }

    /**
     * @param file
     * @param path
     */
    public static void shnegtingChanmodName(File file, String path) {
        try {
            if((new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"/"+file.getName())).exists()){
                logger.warn("文件名已经存在，开始删除同名文件" +path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"/"+file.getName());
                try{
                    file.delete();
                    logger.warn("删除同名文件"+file.getAbsolutePath()+"成功");
                }catch (Exception e){
                    logger.warn("删除同名文件"+file.getAbsolutePath()+"失败",e);
                }
            }else{
                FileUtils.moveToDirectory(file, new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()), true);
                logger.info("移动文件到"+file.getAbsolutePath()+"到"+path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"成功");
            }
        } catch (Exception e) {
            logger.error("移动文件" + file.getName() + "到" + path+ TimeTranstationUtils.Date2yyyy_MM_dd() + "失败", e);
        }
    }

    /**
     * 获取文件父目录
     * @param file
     * @return
     */
    public static String getDir(File file){

        String dir=file.getParent();
        StringTokenizer dirs = new StringTokenizer(dir, separator);
        List<String> list=new ArrayList<String>();
        while(dirs.hasMoreTokens()){
            list.add((String)dirs.nextElement());
        }
        String str="";
        for(int i=2;i<list.size();i++){
            str=str+separator+list.get(i);
        }
        return str+"/";
    }
}

utils/Validation.java—验证工具类

package com.hsiehchou.flume.utils;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * 验证工具类
 */
@Deprecated
public class Validation {
     // ------------------常量定义
    /**
     * Email正则表达式=
     * "^([a-z0-9A-Z]+[-|\\.]?)+[a-z0-9A-Z]@([a-z0-9A-Z]+(-[a-z0-9A-Z]+)?\\.)+[a-zA-Z]{2,}$"
     * ;
     */
    // public static final String EMAIL =
    // "^([a-z0-9A-Z]+[-|\\.]?)+[a-z0-9A-Z]@([a-z0-9A-Z]+(-[a-z0-9A-Z]+)?\\.)+[a-zA-Z]{2,}$";;
    public static final String EMAIL = "\\w+(\\.\\w+)*@\\w+(\\.\\w+)+";

    /**
     * 电话号码正则表达式=
     * (^(\d{2,4}[-_－—]?)?\d{3,8}([-_－—]?\d{3,8})?([-_－—]?\d{1,7})?$)|
     * (^0?1[35]\d{9}$)
     */
    public static final String PHONE = "(^(\\d{2,4}[-_－—]?)?\\d{3,8}([-_－—]?\\d{3,8})?([-_－—]?\\d{1,7})?$)|(^0?1[35]\\d{9}$)";

    /**
     * 手机号码正则表达式=^(13[0-9]|15[0-9]|18[0-9])\d{8}$
     */
    public static final String MOBILE = "^((13[0-9])|(14[5-7])|(15[^4])|(17[0-8])|(18[0-9]))\\d{8}$";

    /**
     * Integer正则表达式 ^-?(([1-9]\d*$)|0)
     */
    public static final String INTEGER = "^-?(([1-9]\\d*$)|0)";

    /**
     * 正整数正则表达式 >=0 ^[1-9]\d*|0$
     */
    public static final String INTEGER_NEGATIVE = "^[1-9]\\d*|0$";

    /**
     * 负整数正则表达式 <=0 ^-[1-9]\d*|0$
     */
    public static final String INTEGER_POSITIVE = "^-[1-9]\\d*|0$";

    /**
     * Double正则表达式 ^-?([1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0)$
     */
    public static final String DOUBLE = "^-?([1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*|0?\\.0+|0)$";

    /**
     * 正Double正则表达式 >=0 ^[1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0$　
     */
    public static final String DOUBLE_NEGATIVE = "^[1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*|0?\\.0+|0$";

    /**
     * 负Double正则表达式 <= 0 ^(-([1-9]\d*\.\d*|0\.\d*[1-9]\d*))|0?\.0+|0$
     */
    public static final String DOUBLE_POSITIVE = "^(-([1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*))|0?\\.0+|0$";

    /**
     * 年龄正则表达式 ^(?:[1-9][0-9]?|1[01][0-9]|120)$ 匹配0-120岁
     */
    public static final String AGE = "^(?:[1-9][0-9]?|1[01][0-9]|120)$";

    /**
     * 邮编正则表达式 [0-9]\d{5}(?!\d) 国内6位邮编
     */
    public static final String CODE = "[0-9]\\d{5}(?!\\d)";

    /**
     * 匹配由数字、26个英文字母或者下划线组成的字符串 ^\w+$
     */
    public static final String STR_ENG_NUM_ = "^\\w+$";

    /**
     * 匹配由数字和26个英文字母组成的字符串 ^[A-Za-z0-9]+$
     */
    public static final String STR_ENG_NUM = "^[A-Za-z0-9]+";

    /**
     * 匹配由26个英文字母组成的字符串 ^[A-Za-z]+$
     */
    public static final String STR_ENG = "^[A-Za-z]+$";

    /**
     * 过滤特殊字符串正则 regEx=
     * "[`~!@#$%^&*()+=|{}':;',\\[\\].<>/?~！@#￥%……&*（）——+|{}【】‘；：”“’。，、？]";
     */
    public static final String STR_SPECIAL = "[`~!@#$%^&*()+=|{}':;',\\[\\].<>/?~！@#￥%……&*（）——+|{}【】‘；：”“’。，、？]";

    /***
     * 日期正则 支持： YYYY-MM-DD YYYY/MM/DD YYYY_MM_DD YYYYMMDD YYYY.MM.DD的形式
     */
    public static final String DATE_ALL = "((^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(10|12|0?[13578])([-\\/\\._]?)(3[01]|[12][0-9]|0?[1-9])$)"
            + "|(^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(11|0?[469])([-\\/\\._]?)(30|[12][0-9]|0?[1-9])$)"
            + "|(^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(0?2)([-\\/\\._]?)(2[0-8]|1[0-9]|0?[1-9])$)|(^([2468][048]00)([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([3579][26]00)"
            + "([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)"
            + "|(^([1][89][0][48])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([2-9][0-9][0][48])([-\\/\\._]?)"
            + "(0?2)([-\\/\\._]?)(29)$)"
            + "|(^([1][89][2468][048])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([2-9][0-9][2468][048])([-\\/\\._]?)(0?2)"
            + "([-\\/\\._]?)(29)$)|(^([1][89][13579][26])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|"
            + "(^([2-9][0-9][13579][26])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$))";

    /***
     * 日期正则 支持： YYYY-MM-DD
     */
    public static final String DATE_FORMAT1 = "(([0-9]{3}[1-9]|[0-9]{2}[1-9][0-9]{1}|[0-9]{1}[1-9][0-9]{2}|[1-9][0-9]{3})-(((0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01]))|((0[469]|11)-(0[1-9]|[12][0-9]|30))|(02-(0[1-9]|[1][0-9]|2[0-8]))))|((([0-9]{2})(0[48]|[2468][048]|[13579][26])|((0[48]|[2468][048]|[3579][26])00))-02-29)";

    /**
     * URL正则表达式 匹配 http www ftp
     */
    public static final String URL = "^(http|www|ftp|)?(://)?(\\w+(-\\w+)*)(\\.(\\w+(-\\w+)*))*((:\\d+)?)(/(\\w+(-\\w+)*))*(\\.?(\\w)*)(\\?)?"
            + "(((\\w*%)*(\\w*\\?)*(\\w*:)*(\\w*\\+)*(\\w*\\.)*(\\w*&)*(\\w*-)*(\\w*=)*(\\w*%)*(\\w*\\?)*"
            + "(\\w*:)*(\\w*\\+)*(\\w*\\.)*"
            + "(\\w*&)*(\\w*-)*(\\w*=)*)*(\\w*)*)$";

    /**
     * 身份证正则表达式
     */
    public static final String IDCARD = "((11|12|13|14|15|21|22|23|31|32|33|34|35|36|37|41|42|43|44|45|46|50|51|52|53|54|61|62|63|64|65)[0-9]{4})"
            + "(([1|2][0-9]{3}[0|1][0-9][0-3][0-9][0-9]{3}"
            + "[Xx0-9])|([0-9]{2}[0|1][0-9][0-3][0-9][0-9]{3}))";

    /**
     * 机构代码
     */
    public static final String JIGOU_CODE = "^[A-Z0-9]{8}-[A-Z0-9]$";

    /**
     * 匹配数字组成的字符串 ^[0-9]+$
     */
    public static final String STR_NUM = "^[0-9]+$";

    // //------------------验证方法
    /**
     * 判断字段是否为空 符合返回ture
     * @param str
     * @return boolean
     */
    public static synchronized boolean StrisNull(String str) {
        return null == str || str.trim().length() <= 0 ? true : false;
    }

    /**
     * 判断字段是非空 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean StrNotNull(String str) {
        return !StrisNull(str);
    }

    /**
     * 字符串null转空
     * @param str
     * @return boolean
     */
    public static String nulltoStr(String str) {
        return StrisNull(str) ? "" : str;
    }

    /**
     * 字符串null赋值默认值
     * @param str 目标字符串
     * @param defaut 默认值
     * @return String
     */
    public static String nulltoStr(String str, String defaut) {
        return StrisNull(str) ? defaut : str;
    }

    /**
     * 判断字段是否为Email 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isEmail(String str) {
        return Regular(str, EMAIL);
    }

    /**
     * 判断是否为电话号码 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isPhone(String str) {
        return Regular(str, PHONE);
    }

    /**
     * 判断是否为手机号码 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isMobile(String str) {
        return RegularSJHM(str, MOBILE);
    }

    /**
     * 判断是否为Url 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isUrl(String str) {
        return Regular(str, URL);
    }

    /**
     * 判断字段是否为数字 正负整数 正负浮点数 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isNumber(String str) {
        return Regular(str, DOUBLE);
    }

    /**
     * 判断字段是否为INTEGER 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isInteger(String str) {
        return Regular(str, INTEGER);
    }

    /**
     * 判断字段是否为正整数正则表达式 >=0 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isINTEGER_NEGATIVE(String str) {
        return Regular(str, INTEGER_NEGATIVE);
    }

    /**
     * 判断字段是否为负整数正则表达式 <=0 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isINTEGER_POSITIVE(String str) {
        return Regular(str, INTEGER_POSITIVE);
    }

    /**
     * 判断字段是否为DOUBLE 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isDouble(String str) {
        return Regular(str, DOUBLE);
    }

    /**
     * 判断字段是否为正浮点数正则表达式 >=0 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isDOUBLE_NEGATIVE(String str) {
        return Regular(str, DOUBLE_NEGATIVE);
    }

    /**
     * 判断字段是否为负浮点数正则表达式 <=0 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isDOUBLE_POSITIVE(String str) {
        return Regular(str, DOUBLE_POSITIVE);
    }

    /**
     * 判断字段是否为日期 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isDate(String str) {
        return Regular(str, DATE_ALL);
    }

    /**
     * 验证
     * @param str
     * @return
     */
    public static boolean isDate1(String str) {
        return Regular(str, DATE_FORMAT1);
    }

    /**
     * 判断字段是否为年龄 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isAge(String str) {
        return Regular(str, AGE);
    }

    /**
     * 判断字段是否超长 字串为空返回fasle, 超过长度{leng}返回ture 反之返回false
     * @param str
     * @param leng
     * @return boolean
     */
    public static boolean isLengOut(String str, int leng) {
        return StrisNull(str) ? false : str.trim().length() > leng;
    }

    /**
     * 判断字段是否为身份证 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isIdCard(String str) {
        if (StrisNull(str))
            return false;
        if (str.trim().length() == 15 || str.trim().length() == 18) {
            return Regular(str, IDCARD);
        } else {
            return false;
        }
    }

    /**
     * 判断字段是否为邮编 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isCode(String str) {
        return Regular(str, CODE);
    }

    /**
     * 判断字符串是不是全部是英文字母
     * @param str
     * @return boolean
     */
    public static boolean isEnglish(String str) {
        return Regular(str, STR_ENG);
    }

    /**
     * 判断字符串是不是全部是英文字母+数字
     * @param str
     * @return boolean
     */
    public static boolean isENG_NUM(String str) {
        return Regular(str, STR_ENG_NUM);
    }

    /**
     * 判断字符串是不是全部是英文字母+数字+下划线
     * @param str
     * @return boolean
     */
    public static boolean isENG_NUM_(String str) {
        return Regular(str, STR_ENG_NUM_);
    }

    /**
     * 过滤特殊字符串 返回过滤后的字符串
     * @param str
     * @return boolean
     */
    public static String filterStr(String str) {
        Pattern p = Pattern.compile(STR_SPECIAL);
        Matcher m = p.matcher(str);
        return m.replaceAll("").trim();
    }

    /**
     * 校验机构代码格式
     * @return
     */
    public static boolean isJigouCode(String str) {
        return Regular(str, JIGOU_CODE);
    }

    /**
     * 判断字符串是不是数字组成
     * @param str
     * @return boolean
     */
    public static boolean isSTR_NUM(String str) {
        return Regular(str, STR_NUM);
    }

    /**
     * 匹配是否符合正则表达式pattern 匹配返回true
     * @param str 匹配的字符串
     * @param pattern 匹配模式
     * @return boolean
     */
    private static boolean Regular(String str, String pattern) {
        if (null == str || str.trim().length() <= 0)
            return false;
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(str);
        return m.matches();
    }

    /**
     * 匹配是否符合正则表达式pattern 匹配返回true
     * @param str 匹配的字符串
     * @param pattern 匹配模式
     * @return boolean
     */
    private static boolean RegularSJHM(String str, String pattern) {
        if (null == str || str.trim().length() <= 0){
            return false;
        }
        if(str.contains("+86")){
            str=str.replace("+86","");
        }
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(str);
        return m.matches();
    }

    /**
     * description:匹配yyyyMMddHHmmss格式时间
     * @param time
     * @return boolean
     */
    public static final String yyyyMMddHHmmss = "[0-9]{14}";

    public static boolean isyyyyMMddHHmmss(String time) {
        if (time == null) {
            return false;
        }
        boolean bool = time.matches(yyyyMMddHHmmss);
        return bool;
    }

    /**
     * description:匹配yyyyMMddHHmmss格式时间
     * @param time
     * @return boolean
     */
    public static final String isMac = "^[A-Fa-f0-9]{2}(-[A-Fa-f0-9]{2}){5}$";

    public static boolean isMac(String mac) {
        if (mac == null) {
            return false;
        }
        boolean bool = mac.matches(isMac);
        return bool;
    }

    /**
     * description:匹配yyyyMMddHHmmss格式时间
     * @param time
     * @return boolean
     */
    public static final String longtime = "[0-9]{10}";

    public static boolean isTimestamp(String timestamp) {
        if (timestamp == null) {
            return false;
        }
        boolean bool = timestamp.matches(longtime);
        return bool;
    }

    /**
     * 判断字段是否为datatype 符合返回ture
     * @param str
     * @return boolean
     */
    public static final String DATATYPE = "^\\d{7}$";
    public static boolean isDATATYPE(String str) {
        return Regular(str, DATATYPE);
    }

    /**
     * 判断字段是否为QQ 符合返回ture
     * @param str
     * @return boolean
     */
    public static final String QQ = "^\\d{5,15}$";
    public static boolean isQQ(String str) {
        return Regular(str, QQ);
    }


    /**
     * 判断字段是否为IMSI 符合返回ture
     * @param str
     * @return boolean
     */
    public static final String IMSI = "^4600[0,1,2,3,4,5,6,7,9]\\d{10}|(46011|46020)\\d{10}$";
    public static boolean isIMSI(String str) {
        return Regular(str, IMSI);
    }

    /**
     * 判断字段是否为IMEI 符合返回ture
     * @param str
     * @return boolean
     */
    public static final String IMEI = "^\\d{8}$|^[a-fA-F0-9]{14}$|^\\d{15}$";
    public static boolean isIMEI(String str) {return Regular(str, IMEI);}

    /**
     * 判断字段是否为CAPTURETIME 符合返回ture
     * @param str
     * @return boolean
     */
    public static final String CAPTURETIME = "^\\d{10}|(20[0-9][0-9])\\d{10}$";
    public static boolean isCAPTURETIME(String str) {return Regular(str, CAPTURETIME);}

    /**
     * description:检测认证类型
     * @param auth
     * @return boolean
     */
    public static final String AUTH_TYPE = "^\\d{7}$";
    public static boolean isAUTH_TYPE(String str) {return Regular(str, CAPTURETIME);}


    /**
     * description:检测FIRM_CODE
     * @param auth
     * @return boolean
     */
    public static final String FIRM_CODE = "^\\d{9}$";
    public static boolean isFIRM_CODE(String str) {return Regular(str, FIRM_CODE);}

    /**
     * description:检测经度
     * @param auth
     * @return boolean
     */
    public static final String LONGITUDE = "^-?(([1-9]\\d?)|(1[0-7]\\d)|180)(\\.\\d{1,6})?$";


    //public static final String LONGITUDE ="^([-]?(\\d|([1-9]\\d)|(1[0-7]\\d)|(180))(\\.\\d*)\\,[-]?(\\d|([1-8]\\d)|(90))(\\.\\d*))$";
    public static boolean isLONGITUDE(String str) {return Regular(str, LONGITUDE);}

    /**
     * description:检测纬度
     *
     * @param auth
     * @return boolean 2016-7-19 下午4:50:06 by 
     */
    public static final String LATITUDE = "^-?(([1-8]\\d?)|([1-8]\\d)|90)(\\.\\d{1,6})?$";
    public static boolean isLATITUDE(String str) {return Regular(str, LATITUDE);}

    public static void main(String[] args) {
        boolean bool = isLATITUDE("25.546685");
        System.out.println(bool);
    }
}

5、constant常量

constant/FlumeConfConstant.java

package com.hsiehchou.flume.constant;

public class FlumeConfConstant {

    //flumeSource配置
    public static final String UNMANAGE="unmanage";
    public static final String DIRS="dirs";
    public static final String SUCCESSFILE="successfile";
    public static final String ALL="all";
    public static final String SOURCE="source";
    public static final String FILENUM="filenum";
    public static final String SLEEPTIME="sleeptime";

    //ESSINK配置
    public static final String TIMECELL="timecell";
    public static final String MAXNUM="maxnum";
    public static final String SINK_SOURCE="source";
    public static final String THREADNUM="threadnum";
    public static final String REDISHOST="redishost";
}

constant/TxtConstant.java

package com.hsiehchou.flume.constant;

public class TxtConstant {

    public static final String TYPE_ES="TYPE_ES";

    public static final String STATIONCENTER="STATIONCENTER";
    public static final String APCENTER="APCENTER";
    public static final String IPLOGINLOG="IPLOGINLOG";
    public static final String IMSIIMEI="IMSIIMEI";
    public static final String MACHOUR="MACHOUR";


    public static final String TYPE_SITEMANAGE="TYPE_SITEMANAGE";
    public static final String JZWA="JZWA";


    public static final String FIRMCODE="FIRMCODE";

    public static final String FILENAME_FIELDS1="FILENAME_FIELDS1";

    public static final String FILENAME_FIELDS2="FILENAME_FIELDS2";

    public static final String FILENAME_FIELDS3="FILENAME_FIELDS3";

    public static final String FILENAME_FIELDS4="FILENAME_FIELDS4";

    public static final String FILENAME_FIELDS5="FILENAME_FIELDS5";

    public static final String FILENAME_VALIDATION="FILENAME_VALIDATION";

    public static final String AUTHTYPE_LIST="AUTHTYPE_LIST";

    public static final String SOURCE_FEIJING="SOURCE_FEIJING";
    public static final String SOURCE_650="SOURCE_650";
    public static final String OFFICE_11="OFFICE_11";
    public static final String OFFICE_12="OFFICE_12";
    public static final String WLZK="WLZK";
    public static final String FEIJING="FEIJING";
    public static final String HLWZC="HLWZC";
    public static final String WIFIWL="WIFIWL";

    // 错误索引
    public static final String ERROR_INDEX="es.errorindex";
    public static final String ERROR_TYPE="es.errortype";

    //WIFI索引
    public static final String WIFILOG_INDEX="es.index.wifilog";
    public static final String IPLOGINLOG_TYPE="es.type.iploginlog";
    public static final String EMAIL_TYPE="es.type.email";
    public static final String FTP_TYPE="es.type.ftp";
    public static final String GAME_TYPE="es.type.game";
    public static final String HEARTBEAT_TYPE="es.type.heartbeat";
    public static final String HTTP_TYPE="es.type.http";
    public static final String IMINFO_TYPE="es.type.iminfo";
    public static final String ORGANIZATION_TYPE="es.type.organization";
    public static final String SEARCH_TYPE="es.type.search";
    public static final String IMSIIMEI_TYPE="es.type.imsiimei";
}

6、field字段

field/ErrorMapFields.java

package com.hsiehchou.flume.fields;

public class ErrorMapFields {

    public static final String RKSJ="RKSJ";

    public static final String RECORD="RECORD";

    public static final String LENGTH="LENGTH";
    public static final String LENGTH_ERROR="LENGTH_ERROR";
    public static final String LENGTH_ERROR_NUM="10001";

    public static final String FILENAME="FILENAME";
    public static final String FILENAME_ERROR="FILENAME_ERROR";
    public static final String FILENAME_ERROR_NUM="10010";
    public static final String ABSOLUTE_FILENAME="ABSOLUTE_FILENAME";


    public static final String SJHM="SJHM";
    public static final String SJHM_ERROR="SJHM_ERROR";
    public static final String SJHM_ERRORCODE="10007";


    public static final String DATA_TYPE="DATA_TYPE";
    public static final String DATA_TYPE_ERROR="DATA_TYPE_ERROR";
    public static final String DATA_TYPE_ERRORCODE="10011";

    public static final String QQ="QQ";
    public static final String QQ_ERROR="QQ_ERROR";
    public static final String QQ_ERRORCODE="10002";

    public static final String IMSI="IMSI";
    public static final String IMSI_ERROR="IMSI_ERROR";
    public static final String IMSI_ERRORCODE="10005";

    public static final String IMEI="IMEI";
    public static final String IMEI_ERROR="IMEI_ERROR";
    public static final String IMEI_ERRORCODE="10006";

    public static final String MAC="MAC";
    public static final String CLIENTMAC="CLIENTMAC";
    public static final String STATIONMAC="STATIONMAC";
    public static final String BSSID="BSSID";
    public static final String MAC_ERROR="MAC_ERROR";
    public static final String MAC_ERRORCODE="10003";

    public static final String DEVICENUM="DEVICENUM";
    public static final String DEVICENUM_ERROR="DEVICENUM_ERROR";
    public static final String DEVICENUM_ERRORCODE="10014";

    public static final String CAPTURETIME="CAPTURETIME";
    public static final String CAPTURETIME_ERROR="CAPTURETIME_ERROR";
    public static final String CAPTURETIME_ERRORCODE="10019";

    public static final String EMAIL="EMAIL";
    public static final String EMAIL_ERROR="EMAIL_ERROR";
    public static final String EMAIL_ERRORCODE="10004";

    public static final String AUTH_TYPE="AUTH_TYPE";
    public static final String AUTH_TYPE_ERROR="AUTH_TYPE_ERROR";
    public static final String AUTH_TYPE_ERRORCODE="10020";

    public static final String FIRM_CODE="FIRM_CODE";
    public static final String FIRMCODE_NUM="FIRMCODE_NUM";
    public static final String FIRM_CODE_ERROR="FIRM_CODE_ERROR";
    public static final String FIRM_CODE_ERRORCODE="10009";

    public static final String STARTTIME="STARTTIME";
    public static final String STARTTIME_ERROR="STARTTIME_ERROR";
    public static final String STARTTIME_ERRORCODE="10015";
    public static final String ENDTIME="ENDTIME";
    public static final String ENDTIME_ERROR="ENDTIME_ERROR";
    public static final String ENDTIME_ERRORCODE="10016";

    public static final String LOGINTIME="LOGINTIME";
    public static final String LOGINTIME_ERROR="LOGINTIME_ERROR";
    public static final String LOGINTIME_ERRORCODE="10017";
    public static final String LOGOUTTIME="LOGOUTTIME";
    public static final String LOGOUTTIME_ERROR="LOGOUTTIME_ERROR";
    public static final String LOGOUTTIME_ERRORCODE="10018";

    public static final String LONGITUDE="LONGITUDE";
    public static final String LONGITUDE_ERROR="LONGITUDE_ERROR";
    public static final String LONGITUDE_ERRORCODE="10012";
    public static final String LATITUDE="LATITUDE";
    public static final String LATITUDE_ERROR="LATITUDE_ERROR";
    public static final String LATITUDE_ERRORCODE="10013";

    //TODO 其他类型DATA_TYPE  记录
    public static final String DATA_TYPE_OTHER="DATA_TYPE_OTHER";
    public static final String DATA_TYPE_OTHER_ERROR="DATA_TYPE_OTHER_ERROR";
    public static final String DATA_TYPE_OTHER_ERRORCODE="10022";

    //TODO USERNAME 错误
    public static final String USERNAME="USERNAME";
    public static final String USERNAME_ERROR="USERNAME_ERROR";
    public static final String USERNAME_ERRORCODE="10023";
}

field/MapFields.java

package com.hsiehchou.flume.fields;

public class MapFields {

    public static final String ID="id";
    public static final String SOURCE="source";
    public static final String TYPE="TYPE";
    public static final String TABLE="table";
    public static final String FILENAME="filename";
    public static final String RKSJ="rksj";
    public static final String ABSOLUTE_FILENAME="absolute_filename";
    public static final String BSSID="BSSID";
    public static final String USERNAME="USERNAME";
    public static final String DAYID="DAYID";

    public static final String FIRMCODE_NUM="FIRMCODE_NUM";
    public static final String FIRM_CODE="FIRM_CODE";
    public static final String IMEI="IMEI";
    public static final String IMSI="IMSI";

    public static final String DATA_TYPE_NAME="DATA_TYPE_NAME";

    public static final String AUTH_TYPE="AUTH_TYPE";
    public static final String AUTH_ACCOUNT="AUTH_ACCOUNT";

    //TODO 时间类参数
    public static final String CAPTURETIME="CAPTURETIME";
    public static final String LOGINTIME="LOGINTIME";
    public static final String LOGOUTTIME="LOGOUTTIME";
    public static final String STARTTIME="STARTTIME";
    public static final String ENDTIME="ENDTIME";
    public static final String FIRSTTIME="FIRSTTIME";
    public static final String LASTTIME="LASTTIME";

    //TODO 去重参数
    public static final String COUNT="COUNT";
    public static final String DATA_TYPE="DATA_TYPE";
    public static final String VALUE="value";
    public static final String SITECODE="SITECODE";
    public static final String SITECODENEW="SITECODENEW";

    public static final String DEVICENUM="DEVICENUM";
    public static final String MAC="MAC";
    public static final String CLIENTMAC="CLIENTMAC";
    public static final String STATIONMAC="STATIONMAC";

    public static final String BRAND="BRAND";
    public static final String INDEX="INDEX";
    public static final String ACTION_TYPE="ACTION_TYPE";


    public static final String CITY_CODE="CITY_CODE";
    /* public static final String FILENAME_FIELDS1="FILENAME_FIELDS1";
    public static final String FILENAME_FIELDS1="FILENAME_FIELDS1";
    public static final String FILENAME_FIELDS1="FILENAME_FIELDS1";
    public static final String FILENAME_FIELDS1="FILENAME_FIELDS1";*/

}

7、自定义sink

sink/KafkaSink.java—将数据下沉到kafka

package com.hsiehchou.flume.sink;

import com.google.common.base.Throwables;
import com.hsiehchou.kafka.producer.StringProducer;
import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;
import org.apache.log4j.Logger;

import java.util.ArrayList;
import java.util.List;

public class KafkaSink extends AbstractSink implements Configurable {

    private final Logger logger = Logger.getLogger(KafkaSink.class);
    private String[] kafkatopics = null;
    //private List<KeyedMessage<String,String>> listKeyedMessage=null;
    private List<String> listKeyedMessage=null;
    private Long proTimestamp=System.currentTimeMillis();

    /**
     * 配置读取
     * @param context
     */
    @Override
    public void configure(Context context) {
        //tier1.sinks.sink1.kafkatopic=chl_test7
        //获取 推送kafkatopic参数
        kafkatopics = context.getString("kafkatopics").split(",");
        logger.info("获取kafka topic配置" + context.getString("kafkatopics"));
        listKeyedMessage=new ArrayList<>();
    }

    @Override
    public Status process() throws EventDeliveryException {

        logger.info("sink开始执行");
        Channel channel = getChannel();
        Transaction transaction = channel.getTransaction();
        transaction.begin();
        try {
            //从channel中拿到event
            Event event = channel.take();
            if (event == null) {
                transaction.rollback();
                return Status.BACKOFF;
            }
            // 解析记录 获取事件内容
            String recourd = new String(event.getBody());
            // 发送数据到kafka
            try {
                //调用kafka的消息推送，将数据推送到kafka
                StringProducer.producer(kafkatopics[0],recourd);
            /*    if(listKeyedMessage.size()>1000){
                    logger.info("数据大与10000,推送数据到kafka");
                    sendListKeyedMessage();
                    logger.info("数据大与10000,推送数据到kafka成功");
                }else if(System.currentTimeMillis()-proTimestamp>=60*1000){
                    logger.info("时间间隔大与60,推送数据到kafka");
                    sendListKeyedMessage();
                    logger.info("时间间隔大与60,推送数据到kafka成功"+listKeyedMessage.size());
                }*/
            } catch (Exception e) {
                logger.error("推送数据到kafka失败" , e);
                throw Throwables.propagate(e);
            }
            transaction.commit();
            return Status.READY;
        } catch (ChannelException e) {
            logger.error(e);
            transaction.rollback();
            return Status.BACKOFF;
        } finally {
            if(transaction != null){
                transaction.close();
            }
        }
    }

    @Override
    public synchronized void stop() {
        super.stop();
    }

    /*private void sendListKeyedMessage(){
        Producer<String, String> producer = new Producer<>(KafkaConfig.getInstance().getProducerConfig());
        producer.send(listKeyedMessage);
        listKeyedMessage.clear();
        proTimestamp=System.currentTimeMillis();
        producer.close();
    }*/
}

8、service

DataCheck.java—数据校验

package com.hsiehchou.flume.service;

import com.alibaba.fastjson.JSON;
import com.hsiehchou.common.net.HttpRequest;
import com.hsiehchou.common.project.datatype.DataTypeProperties;
import com.hsiehchou.common.time.TimeTranstationUtils;
import com.hsiehchou.flume.fields.ErrorMapFields;
import com.hsiehchou.flume.fields.MapFields;
import org.apache.log4j.Logger;

import java.util.*;

/**
 * 数据校验
 */
public class DataCheck {

    private final static Logger LOG = Logger.getLogger(DataCheck.class);

    /**
     * 获取数据类型对应的字段  对应的文件
     * 结构为 [ 数据类型1 = [字段1，字段2。。。。]，
     * 数据类型2 = [字段1，字段2。。。。]]
     */
    private static Map<String, ArrayList<String>> dataMap = DataTypeProperties.dataTypeMap;

    /**
     * 数据解析
     * @param line
     * @param fileName
     * @param absoluteFilename
     * @return
     */
    public static Map<String, String> txtParse(String line, String fileName, String absoluteFilename) {

        Map<String, String> map = new HashMap<String, String>();
        String[] fileNames = fileName.split("_");
        String dataType = fileNames[0];

        if (dataMap.containsKey(dataType)) {
            List<String> fields = dataMap.get(dataType.toLowerCase());
            String[] splits = line.split("\t");
            //长度校验
            if (fields.size() == splits.length) {
                //添加公共字段
                map.put(MapFields.ID, UUID.randomUUID().toString().replace("-", ""));
                map.put(MapFields.TABLE, dataType.toLowerCase());
                map.put(MapFields.RKSJ, (System.currentTimeMillis() / 1000) + "");
                map.put(MapFields.FILENAME, fileName);
                map.put(MapFields.ABSOLUTE_FILENAME, absoluteFilename);
                for (int i = 0; i < splits.length; i++) {
                    map.put(fields.get(i), splits[i]);
                }
            } else {
                map = null;
                LOG.error("字段长度不匹配fields"+fields.size()  + "/t" + splits.length);
            }
        } else {
            map = null;
            LOG.error("配置文件中不存在此数据类型");
        }
        return map;
    }

    /**
     * 数据长度校验添加必要字段并转map，将长度不符合的插入ES数据库
     * @param line
     * @param fileName
     * @param absoluteFilename
     * @return
     */
    public static Map<String, String> txtParseAndalidation(String line, String fileName, String absoluteFilename) {

        Map<String, String> map = new HashMap<String, String>();
        Map<String, Object> errorMap = new HashMap<String, Object>();

        //文件名按"_"切分  wechat_source1_1111142.txt
        //wechat 数据类型
        //source1 数据来源
        //1111142  不让文件名相同
        String[] fileNames = fileName.split("_");
        String dataType = fileNames[0];
        String source = fileNames[1];

        if (dataMap.containsKey(dataType)) {
            //获取数据类型字段
            // imei,imsi,longitude,latitude,phone_mac,device_mac,device_number,collect_time,username,phone,object_username,send_message,accept_message,message_time
            //根据数据类型，获取改类型的字段
            List<String> fields = dataMap.get(dataType.toLowerCase());
            //line
            String[] splits = line.split("\t");

            //长度校验
            if (fields.size() == splits.length) {
                for (int i = 0; i < splits.length; i++) {
                    map.put(fields.get(i), splits[i]);
                }
                //添加公共字段
                // map.put(SOURCE, source);
                map.put(MapFields.ID, UUID.randomUUID().toString().replace("-", ""));
                map.put(MapFields.TABLE, dataType.toLowerCase());
                map.put(MapFields.RKSJ, (System.currentTimeMillis() / 1000) + "");
                map.put(MapFields.FILENAME, fileName);
                map.put(MapFields.ABSOLUTE_FILENAME, absoluteFilename);

                //数据封装完成  开始进行数据校验
                errorMap = DataValidation.dataValidation(map);
            } else {
                errorMap.put(ErrorMapFields.LENGTH, "字段数不匹配 实际" + fields.size() + "\t" + "结果" + splits.length);
                errorMap.put(ErrorMapFields.LENGTH_ERROR, ErrorMapFields.LENGTH_ERROR_NUM);
                LOG.info("字段数不匹配 实际" + fields.size() + "\t" + "结果" + splits.length);
                map = null;
            }

            //判断数据是否存在错误
            if (null != errorMap && errorMap.size() > 0) {
                LOG.info("errorMap===" + errorMap);
                if ("1".equals("1")) {
                    //addErrorMapES(errorMap, map, fileName, absoluteFilename);
                    //验证没通过，将错误数据写到ES，并将map置空
                    addErrorMapESByHTTP(errorMap, map, fileName, absoluteFilename);
                }
                map = null;
            }
        } else {
            map = null;
            LOG.error("配置文件中不存在此数据类型");
        }
        return map;
    }

    /**
     *  将错误信息写入ES，方便查错
     * @param errorMap
     * @param map
     * @param fileName
     * @param absoluteFilename
     */
    public static void addErrorMapESByHTTP(Map<String, Object> errorMap, Map<String, String> map, String fileName, String absoluteFilename) {

        String errorType = fileName.split("_")[0];
        errorMap.put(MapFields.TABLE, errorType);
        errorMap.put(MapFields.ID, UUID.randomUUID().toString().replace("-", ""));
        errorMap.put(ErrorMapFields.RECORD, map);
        errorMap.put(ErrorMapFields.FILENAME, fileName);
        errorMap.put(ErrorMapFields.ABSOLUTE_FILENAME, absoluteFilename);
        errorMap.put(ErrorMapFields.RKSJ, TimeTranstationUtils.Date2yyyy_MM_dd_HH_mm_ss());
        String url="http://192.168.116.201:9200/error_recourd/error_recourd/"+ errorMap.get(MapFields.ID).toString();
        String json = JSON.toJSONString(errorMap);
        HttpRequest.sendPost(url,json);
        //HttpRequest.sendPostMessage(url, errorMap);
    }

    /*
    public static void addErrorMapES(Map<String, Object> errorMap, Map<String, String> map, String fileName, String absoluteFilename) {

        String errorType = fileName.split("_")[0];
        errorMap.put(MapFields.TABLE, errorType);
        errorMap.put(MapFields.ID, UUID.randomUUID().toString().replace("-", ""));
        errorMap.put(ErrorMapFields.RECORD, map);
        errorMap.put(ErrorMapFields.FILENAME, fileName);
        errorMap.put(ErrorMapFields.ABSOLUTE_FILENAME, absoluteFilename);
        errorMap.put(ErrorMapFields.RKSJ, TimeTranstationUtils.Date2yyyy_MM_dd_HH_mm_ss());


        TransportClient client = null;
        try {
            LOG.info("开始获取客户端===============================" + errorMap);
            client = ESClientUtils.getClient();
        } catch (Throwable t) {
            if (t instanceof Error) {
                throw (Error)t;
            }
            LOG.error(null,t);
        }
        //JestClient jestClient = JestService.getJestClient();
        //boolean bool = JestService.indexOne(jestClient,TxtConstant.ERROR_INDEX, TxtConstant.ERROR_TYPE,errorMap.get(MapFields.ID).toString(),errorMap);
        LOG.info("开始写入错误数据到ES===============================" + errorMap);
        boolean bool = IndexUtil.putIndexData(TxtConstant.ERROR_INDEX, TxtConstant.ERROR_TYPE, errorMap.get(MapFields.ID).toString(), errorMap,client);
        if(bool){
            LOG.info("写入错误数据到ES===============================" + errorMap);
        }else{
            LOG.info("写入错误数据到ES===============================失败");
        }

    }*/

    public static void main(String[] args) {

    }
}

DataValidation.java

package com.hsiehchou.flume.service;

import com.hsiehchou.flume.fields.ErrorMapFields;
import com.hsiehchou.flume.fields.MapFields;
import com.hsiehchou.flume.utils.Validation;
import org.apache.commons.lang.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class DataValidation {
    private static final Logger LOG = LoggerFactory.getLogger(DataValidation.class);

   //  private static final TxtConfigurationFileReader reader = TxtConfigurationFileReader.getInstance();
   //  private static final DataTypeConfigurationFileReader datatypereader = DataTypeConfigurationFileReader.getInstance();
   //  private static final ValidationConfigurationFileReader readerValidation = ValidationConfigurationFileReader.getInstance();

    private static Map<String,String>  dataTypeMap;
    private static List<String> listAuthType;
    private static String isErrorES;
    private static final String USERNAME=ErrorMapFields.USERNAME;

    private static final String DATA_TYPE=ErrorMapFields.DATA_TYPE;
    private static final String DATA_TYPE_ERROR=ErrorMapFields.DATA_TYPE_ERROR;
    private static final String DATA_TYPE_ERRORCODE=ErrorMapFields.DATA_TYPE_ERRORCODE;

    private static final String SJHM=ErrorMapFields.SJHM;
    private static final String SJHM_ERROR=ErrorMapFields.SJHM_ERROR;
    private static final String SJHM_ERRORCODE=ErrorMapFields.SJHM_ERRORCODE;

    private static final String QQ=ErrorMapFields.QQ;
    private static final String QQ_ERROR=ErrorMapFields.QQ_ERROR;
    private static final String QQ_ERRORCODE=ErrorMapFields.QQ_ERRORCODE;

    private static final String IMSI=ErrorMapFields.IMSI;
    private static final String IMSI_ERROR=ErrorMapFields.IMSI_ERROR;
    private static final String IMSI_ERRORCODE=ErrorMapFields.IMSI_ERRORCODE;

    private static final String IMEI=ErrorMapFields.IMEI;
    private static final String IMEI_ERROR=ErrorMapFields.IMEI_ERROR;
    private static final String IMEI_ERRORCODE=ErrorMapFields.IMEI_ERRORCODE;

    private static final String MAC=ErrorMapFields.MAC;
    private static final String CLIENTMAC=ErrorMapFields.CLIENTMAC;
    private static final String STATIONMAC=ErrorMapFields.STATIONMAC;
    private static final String BSSID=ErrorMapFields.BSSID;
    private static final String MAC_ERROR=ErrorMapFields.MAC_ERROR;
    private static final String MAC_ERRORCODE=ErrorMapFields.MAC_ERRORCODE;

    private static final String DEVICENUM=ErrorMapFields.DEVICENUM;
    private static final String DEVICENUM_ERROR=ErrorMapFields.DEVICENUM_ERROR;
    private static final String DEVICENUM_ERRORCODE=ErrorMapFields.DEVICENUM_ERRORCODE;

    private static final String CAPTURETIME=ErrorMapFields.CAPTURETIME;
    private static final String CAPTURETIME_ERROR=ErrorMapFields.CAPTURETIME_ERROR;
    private static final String CAPTURETIME_ERRORCODE=ErrorMapFields.CAPTURETIME_ERRORCODE;

    private static final String EMAIL=ErrorMapFields.EMAIL;
    private static final String EMAIL_ERROR=ErrorMapFields.EMAIL_ERROR;
    private static final String EMAIL_ERRORCODE=ErrorMapFields.EMAIL_ERRORCODE;

    private static final String AUTH_TYPE=ErrorMapFields.AUTH_TYPE;
    private static final String AUTH_TYPE_ERROR=ErrorMapFields.AUTH_TYPE_ERROR;
    private static final String AUTH_TYPE_ERRORCODE=ErrorMapFields.AUTH_TYPE_ERRORCODE;

    private static final String FIRM_CODE=ErrorMapFields.FIRM_CODE;
    private static final String FIRM_CODE_ERROR=ErrorMapFields.FIRM_CODE_ERROR;
    private static final String FIRM_CODE_ERRORCODE=ErrorMapFields.FIRM_CODE_ERRORCODE;

    private static final String STARTTIME=ErrorMapFields.STARTTIME;
    private static final String STARTTIME_ERROR=ErrorMapFields.STARTTIME_ERROR;
    private static final String STARTTIME_ERRORCODE=ErrorMapFields.STARTTIME_ERRORCODE;
    private static final String ENDTIME=ErrorMapFields.ENDTIME;
    private static final String ENDTIME_ERROR=ErrorMapFields.ENDTIME_ERROR;
    private static final String ENDTIME_ERRORCODE=ErrorMapFields.ENDTIME_ERRORCODE;

    private static final String LOGINTIME=ErrorMapFields.LOGINTIME;
    private static final String LOGINTIME_ERROR=ErrorMapFields.LOGINTIME_ERROR;
    private static final String LOGINTIME_ERRORCODE=ErrorMapFields.LOGINTIME_ERRORCODE;
    private static final String LOGOUTTIME=ErrorMapFields.LOGOUTTIME;
    private static final String LOGOUTTIME_ERROR=ErrorMapFields.LOGOUTTIME_ERROR;
    private static final String LOGOUTTIME_ERRORCODE=ErrorMapFields.LOGOUTTIME_ERRORCODE;

    public static Map<String, Object> dataValidation( Map<String, String> map){
        if(map == null){
            return null;
        }

        Map<String,Object> errorMap = new HashMap<String,Object>();
        //验证手机号码
        sjhmValidation(map,errorMap);
        //验证MAC
        macValidation(map,errorMap);
        //验证经纬度
        longlaitValidation(map,errorMap);

        //定义自己的清洗规则

        //TODO 大小写统一
        //TODO 时间类型统一
        //TODO 数据字段统一
        //TODO 业务字段转换
        //TODO 数据矫正
        //TODO 验证MAC不能为空
        //TODO 验证IMSI不能为空
        //TODO 验证 QQ IMSI IMEI
        //TODO 验证DEVICENUM是否为空 为空返回错误
        /*devicenumValidation(map,errorMap);
        //TODO 验证CAPTURETIME是否为空 为空过滤   不为10，14位数字过滤
        capturetimeValidation(map,errorMap);
        //TODO 验证EMAIL
        emailValidation(map,errorMap);
        //TODO 验证STARTTIME ENDTIME LOGINTIME LOGOUTTIME
        timeValidation(map,errorMap);
        */
        return errorMap;
    }

    /**
     * 手机号码验证
     * @param map
     * @param errorMap
     */
    public static void sjhmValidation(Map<String, String> map,Map<String,Object> errorMap){
        if(map.containsKey("phone")){
            String sjhm=map.get("phone");
            //调用正则做手机号码验证，是否是正确的一个，检验
            boolean ismobile = Validation.isMobile(sjhm);
            if(!ismobile){
                errorMap.put(SJHM,sjhm);
                errorMap.put(SJHM_ERROR,SJHM_ERRORCODE);
            }
        }
    }

    //TODO QQ验证  10002  QQ编码 1030001    需要根据DATATYPE来判断数据类型的一起验证
    public static void virtualValidation(String dataType, Map<String, String> map,Map<String,Object> errorMap){

        //TODO USERNAME验证  10023  长度》=2
        if(map.containsKey(ErrorMapFields.USERNAME)){
            String username=map.get(ErrorMapFields.USERNAME);
            if(StringUtils.isNotBlank(username)){
                if(username.length()<2){
                    errorMap.put(ErrorMapFields.USERNAME,username);
                    errorMap.put(ErrorMapFields.USERNAME_ERROR,ErrorMapFields.USERNAME_ERRORCODE);
                }
            }
        }

        //TODO QQ验证  10002  QQ编码 1030001
        if("1030001".equals(dataType)&& map.containsKey(USERNAME)){
            String qqnum= map.get(USERNAME);
            boolean bool = Validation.isQQ(qqnum);
            if(!bool){
                errorMap.put(QQ,qqnum);
                errorMap.put(QQ_ERROR,QQ_ERRORCODE);
            }
        }

        //TODO IMSI验证  10005  IMSI编码 1429997
        if("1429997".equals(dataType)&& map.containsKey(IMSI)){
            String imsi= map.get(IMSI);
            boolean bool = Validation.isIMSI(imsi);
            if(!bool){
                errorMap.put(IMSI,imsi);
                errorMap.put(IMSI_ERROR,IMSI_ERRORCODE);
            }
        }

        //TODO IMEI验证  10006  IMEI编码 1429998
        if("1429998".equals(dataType)&& map.containsKey(IMEI)){
            String imei= map.get(IMEI);
            boolean bool = Validation.isIMEI(imei);
            if(!bool){
                errorMap.put(IMEI,imei);
                errorMap.put(IMEI_ERROR,IMEI_ERRORCODE);
            }
        }
    }

    //MAC验证  10003
    public static void macValidation( Map<String, String> map,Map<String,Object> errorMap){
        if(map == null){
            return ;
        }
        if(map.containsKey("phone_mac")){
            String mac=map.get("phone_mac");
            if(StringUtils.isNotBlank(mac)){
                boolean bool = Validation.isMac(mac);
                if(!bool){
                    LOG.info("MAC验证失败");
                    errorMap.put(MAC,mac);
                    errorMap.put(MAC_ERROR,MAC_ERRORCODE);
                }
            }else{
                LOG.info("MAC验证失败");
                errorMap.put(MAC,mac);
                errorMap.put(MAC_ERROR,MAC_ERRORCODE);
            }
        }
    }

    /**
     * TODO DEVICENUM 验证 为空过滤
     * @param map
     * @param errorMap
     */
    public static void devicenumValidation( Map<String, String> map,Map<String,Object> errorMap){
        if(map == null){
            return ;
        }
        if(map.containsKey("device_number")){
            String devicenum=map.get("device_number");
            if(StringUtils.isBlank(devicenum)){
                errorMap.put(DEVICENUM,"设备编码不能为空");
                errorMap.put(DEVICENUM_ERROR,DEVICENUM_ERRORCODE);
            }
        }
    }

    /**
     * TODO CAPTURETIME验证 为空过滤  10019  验证时间长度为10或14位
     * @param map
     * @param errorMap
     */
    public static void capturetimeValidation( Map<String, String> map,Map<String,Object> errorMap){
        if(map == null){
            return ;
        }
        if(map.containsKey(CAPTURETIME)){
            String capturetime=map.get(CAPTURETIME);
            if(StringUtils.isBlank(capturetime)){
                errorMap.put(CAPTURETIME,"CAPTURETIME不能为空");
                errorMap.put(CAPTURETIME_ERROR,CAPTURETIME_ERRORCODE);
            }else{
                boolean bool = Validation.isCAPTURETIME(capturetime);
                if(!bool){
                    errorMap.put(CAPTURETIME,capturetime);
                    errorMap.put(CAPTURETIME_ERROR,CAPTURETIME_ERRORCODE);
                }
            }
        }
    }

    //TODO EMAIL验证 为空过滤 为错误过滤  10004  通过TABLE取USERNAME验证
    public static void emailValidation( Map<String, String> map,Map<String,Object> errorMap){
        if(map == null){
            return ;
        }
        if(map.get("TABLE").equals(EMAIL)){
            String email=map.get(USERNAME);
            if(StringUtils.isNotBlank(email)){
                boolean bool = Validation.isEmail(email);
                if(!bool){
                    errorMap.put(EMAIL,email);
                    errorMap.put(EMAIL_ERROR,EMAIL_ERRORCODE);
                }
            }else{
                errorMap.put(EMAIL,"EMAIL不能为空");
                errorMap.put(EMAIL_ERROR,EMAIL_ERRORCODE);
            }
        }
    }

    //TODO EMAIL验证 为空过滤 为错误过滤  10004  通过TABLE取USERNAME验证
    public static void timeValidation( Map<String, String> map,Map<String,Object> errorMap){
        if(map == null){
            return ;
        }
        if(map.containsKey(STARTTIME)&&map.containsKey(ENDTIME)){
            String starttime=map.get(STARTTIME);
            String endtime=map.get(ENDTIME);
            if(StringUtils.isBlank(starttime)&&StringUtils.isBlank(endtime)){
                errorMap.put(STARTTIME,"STARTTIME和ENDTIME不能同时为空");
                errorMap.put(STARTTIME_ERROR,STARTTIME_ERRORCODE);
                errorMap.put(ENDTIME,"STARTTIME和ENDTIME不能同时为空");
                errorMap.put(ENDTIME_ERROR,ENDTIME_ERRORCODE);
            }else{
                Boolean bool1 = istime(starttime, STARTTIME, STARTTIME_ERROR, STARTTIME_ERRORCODE, errorMap);
                Boolean bool2 = istime(endtime, ENDTIME, ENDTIME_ERROR, ENDTIME_ERRORCODE, errorMap);

                if(bool1&&bool2&&(starttime.length()!=endtime.length())){
                    errorMap.put(STARTTIME,"STARTTIME和ENDTIME长度不等 STARTTIME："+starttime + "\t"+"ENDTIME:" + endtime);
                    errorMap.put(STARTTIME_ERROR,STARTTIME_ERRORCODE);
                    errorMap.put(ENDTIME,"STARTTIME和ENDTIME长度不等 STARTTIME："+starttime + "\t"+"ENDTIME:" + endtime);
                    errorMap.put(ENDTIME_ERROR,ENDTIME_ERRORCODE);
                }
                else if(bool1&&bool2&&(endtime.compareTo(starttime)<0)){
                    errorMap.put(STARTTIME,"ENDTIME必须大于STARTTIME  STARTTIME:"+starttime + "\t"+"ENDTIME:" + endtime);
                    errorMap.put(STARTTIME_ERROR,STARTTIME_ERRORCODE);
                    errorMap.put(ENDTIME,"ENDTIME必须大于STARTTIME  STARTTIME:"+starttime + "\t"+"ENDTIME:" + endtime);
                    errorMap.put(ENDTIME_ERROR,ENDTIME_ERRORCODE);
                }
            }

        }else if(map.containsKey(LOGINTIME)&&map.containsKey(LOGOUTTIME)){

            String logintime=map.get(LOGINTIME);
            String logouttime=map.get(LOGOUTTIME);

            if(StringUtils.isBlank(logintime)&&StringUtils.isBlank(logouttime)){
                errorMap.put(LOGINTIME,"LOGINTIME和LOGOUTTIME不能同时为空");
                errorMap.put(LOGINTIME_ERROR,LOGINTIME_ERRORCODE);
                errorMap.put(LOGOUTTIME,"LOGINTIME和LOGOUTTIME不能同时为空");
                errorMap.put(LOGOUTTIME_ERROR,LOGOUTTIME_ERRORCODE);
            }else{
                Boolean bool1 = istime(logintime, LOGINTIME, LOGINTIME_ERROR, LOGINTIME_ERRORCODE, errorMap);
                Boolean bool2 = istime(logouttime, LOGOUTTIME, LOGOUTTIME_ERROR, LOGOUTTIME_ERRORCODE, errorMap);

                if(bool1&&bool2&&(logintime.length()!=logouttime.length())){
                    errorMap.put(LOGINTIME,"LOGOUTTIME LOGINTIME长度不等 LOGINTIME："+logintime + "\t"+"LOGOUTTIME:" + logouttime);
                    errorMap.put(LOGINTIME_ERROR,LOGINTIME_ERRORCODE);
                    errorMap.put(LOGOUTTIME,"LOGOUTTIME LOGINTIME长度不等 LOGINTIME："+logintime + "\t"+"LOGOUTTIME:" + logouttime);
                    errorMap.put(LOGOUTTIME_ERROR,LOGOUTTIME_ERRORCODE);
                }
                else if(bool1&&bool2&&(logouttime.compareTo(logintime)<0)){
                    errorMap.put(LOGINTIME,"LOGOUTTIME必须大于LOGINTIME  LOGINTIME:"+logintime + "\t"+"LOGOUTTIME:" + logouttime);
                    errorMap.put(LOGINTIME_ERROR,LOGINTIME_ERRORCODE);
                    errorMap.put(LOGOUTTIME,"LOGOUTTIME必须大于LOGINTIME  LOGINTIME:"+logintime + "\t"+"LOGOUTTIME:" + logouttime);
                    errorMap.put(LOGOUTTIME_ERROR,LOGOUTTIME_ERRORCODE);
                }
            }
        }
    }

    //TODO AUTH_TYPE验证  为空过滤 为错误过滤  10020
    public static void authtypeValidation( Map<String, String> map,Map<String,Object> errorMap){
        if(map == null){
            return ;
        }
        String fileName=map.get(MapFields.FILENAME);

        if(fileName.split("_").length<=2){
            map = null;
            return;
        }

        if(StringUtils.isNotBlank(fileName)){
            if("bh".equals(fileName.split("_")[2])||"wy".equals(fileName.split("_")[2])||"yc".equals(fileName.split("_")[2])){
                return ;
            }else if(map.containsKey(AUTH_TYPE)){
                String authtype=map.get(AUTH_TYPE);
                if(StringUtils.isNotBlank(authtype)){
                    if(listAuthType.contains(authtype)){
                        if("1020004".equals(authtype)){
                            String sjhm=map.get(MapFields.AUTH_ACCOUNT);

                            boolean ismobile = Validation.isMobile(sjhm);
                            if(!ismobile){
                                errorMap.put(SJHM,sjhm);
                                errorMap.put(SJHM_ERROR,SJHM_ERRORCODE);
                            }
                        }
                        if("1020002".equals(authtype)){

                            String mac=map.get(MapFields.AUTH_ACCOUNT);
                            boolean ismac = Validation.isMac(mac);
                            if(!ismac){
                                errorMap.put(MAC,mac);
                                errorMap.put(MAC_ERROR,MAC_ERRORCODE);
                            }
                        }
                    }else{
                        errorMap.put(AUTH_TYPE,"AUTHTYPE_LIST 影射里没有"+ "\t"+ "["+ authtype+"]");
                        errorMap.put(AUTH_TYPE_ERROR,AUTH_TYPE_ERRORCODE);
                    }
                }else{
                    errorMap.put(AUTH_TYPE,"AUTH_TYPE 不能为空");
                    errorMap.put(AUTH_TYPE_ERROR,AUTH_TYPE_ERRORCODE);
                }
            }
        }
    }

    private static final String LONGITUDE = "longitude";
    private static final String LATITUDE = "latitude";
    private static final String LONGITUDE_ERROR=ErrorMapFields.LONGITUDE_ERROR;
    private static final String LONGITUDE_ERRORCODE=ErrorMapFields.LONGITUDE_ERRORCODE;
    private static final String LATITUDE_ERROR=ErrorMapFields.LATITUDE_ERROR;
    private static final String LATITUDE_ERRORCODE=ErrorMapFields.LATITUDE_ERRORCODE;

    /**
     * 经纬度验证  错误过滤  10012  10013
     * @param map
     * @param errorMap
     */
    public static void longlaitValidation( Map<String, String> map,Map<String,Object> errorMap){
        if(map == null){
            return ;
        }
        if(map.containsKey(LONGITUDE)&&map.containsKey(LATITUDE)){
            String longitude=map.get(LONGITUDE);
            String latitude=map.get(LATITUDE);

            boolean bool1 = Validation.isLONGITUDE(longitude);
            boolean bool2 = Validation.isLATITUDE(latitude);

            if(!bool1){
                errorMap.put(LONGITUDE,longitude);
                errorMap.put(LONGITUDE_ERROR,LONGITUDE_ERRORCODE);
            }
            if(!bool2){
                errorMap.put(LATITUDE,latitude);
                errorMap.put(LATITUDE_ERROR,LATITUDE_ERRORCODE);
            }
        }
    }

    public static Boolean istime(String time,String str1,String str2,String str3,Map<String,Object> errorMap){

        if(StringUtils.isNotBlank(time)){
            boolean bool = Validation.isCAPTURETIME(time);
            if(!bool){
                errorMap.put(str1,time);
                errorMap.put(str2,str3);
                return false;
            }
            return true;
        }
        return false;
    }
}

9、配置CDH上的Agent文件—-跟FolderSource等里面读取配置文件相对应

Flume配置文件

Flume配置：

tier1.sources= source1
tier1.channels=channel1
tier1.sinks=sink1

#定义source1
tier1.sources.source1.type = com.hsiehchou.flume.source.FolderSource

#读取文件之后睡眠时间
tier1.sources.source1.sleeptime=5
tier1.sources.source1.filenum=3000
tier1.sources.source1.dirs =/usr/chl/data/filedir/
tier1.sources.source1.successfile=/usr/chl/data/filedir_successful/
tier1.sources.source1.deserializer.outputCharset=UTF-8
tier1.sources.source1.channels = channel1

# 定义拦截器1
tier1.sources.source1.interceptors=i1
tier1.sources.source1.interceptors.i1.type=com.hsiehchou.flume.interceptor.DataCleanInterceptor$Builder

#定义channel
tier1.channels.channel1.type = memory
tier1.channels.channel1.keep-alive= 300
tier1.channels.channel1.capacity = 1000000
tier1.channels.channel1.transactionCapacity = 5000
tier1.channels.channel1.byteCapacityBufferPercentage = 200
tier1.channels.channel1.byteCapacity = 80000

#定义sink1
tier1.sinks.sink1.type = com.hsiehchou.flume.sink.KafkaSink
tier1.sinks.sink1.kafkatopics = chl_test7
tier1.sinks.sink1.channel = channel1

ftp监控文件

flumesource 不断监控 ftp 文件目录，通过自定义拦截器拦截，然后推送到flumechannel，然后通过flumesink下沉到kafka

10、flume打包到服务器执行

flume插件目录

不能放在默认的/usr/lib/flume-ng/plugins.d下面

mkdir -p /var/lib/flume-ng/plugins.d/chl/lib
mkdir -p /usr/chl/data/filedir/
mkdir -p /usr/chl/data/filedir_successful/

要设置777，flume启动的时候是以flume权限启动的，所以要更改权限
chmod 777 /usr/chl/data/filedir/

kafka-topics –zookeeper hadoop1:2181 –topic chl_test7 –create –replication-factor 1 –partitions 3

kafka-topics –zookeeper hadoop1:2181 –list

kafka-topics –zookeeper hadoop1:2181 –delete –topic chl_test7

kafka-console-consumer –bootstrap-server hadoop1:9092 –topic chl_test7 –from-beginning
消费kafka

六、Kafka开发

xz_bigdata_kafka

1、pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata2</artifactId>
        <groupId>com.hsiehchou</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_kafka</artifactId>

    <name>xz_bigdata_kafka</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <poi.version>3.14</poi.version>
        <kafka.version>0.9.0-kafka-2.0.2</kafka.version>
        <mysql.connector.version>5.1.46</mysql.connector.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_resources</artifactId>
            <version>1.0-SNAPSHOT</version>
            <optional>true</optional>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>org.apache.zookeeper</groupId>
            <artifactId>zookeeper</artifactId>
            <version>${zookeeper.version}-${cdh.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka_2.10</artifactId>
            <version>${kafka.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>${poi.version}</version>
            <optional>true</optional>
        </dependency>

        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>

        <dependency>
            <artifactId>scala-reflect</artifactId>
            <groupId>org.scala-lang</groupId>
            <version>${scala.version}</version>
        </dependency>
    </dependencies>

</project>

2、config/KafkaConfig.java—kafka配置文件解析器

package com.hsiehchou.kafka.config;

import com.hsiehchou.common.config.ConfigUtil;
import kafka.producer.ProducerConfig;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Properties;

/**
 * kafka配置文件 解析器
 */
public class KafkaConfig {

    private static final Logger LOG = LoggerFactory.getLogger(KafkaConfig.class);

    private static final String DEFUALT_CONFIG_PATH = "kafka/kafka-server-config.properties";

    private volatile static KafkaConfig kafkaConfig = null;

    private ProducerConfig config;
    private Properties properties;

    private KafkaConfig() throws IOException{
        try {
            properties = ConfigUtil.getInstance().getProperties(DEFUALT_CONFIG_PATH);
        } catch (Exception e) {
            IOException ioException = new IOException();
            ioException.addSuppressed(e);
            throw ioException;
        }
        config = new ProducerConfig(properties);
    }

    public static KafkaConfig getInstance(){
        if(kafkaConfig == null){
            synchronized (KafkaConfig.class) {
                if(kafkaConfig == null){
                    try {
                        kafkaConfig = new KafkaConfig();
                    } catch (IOException e) {
                        LOG.error("实例化kafkaConfig失败", e);
                    }
                }
            }
        }
        return kafkaConfig;
    }

    public ProducerConfig getProducerConfig(){
        return config;
    }

    /**
      * 获取当前时间的字符串       格式为：yyyy-MM-dd HH:mm:ss
      * @return String
     */
    public static String nowStr(){
        return new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format( new Date() );
    }
}

3、producer/StringProducer.java—生产者

package com.hsiehchou.kafka.producer;

import com.hsiehchou.common.thread.ThreadPoolManager;
import com.hsiehchou.kafka.config.KafkaConfig;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;

public class StringProducer {
    private static final Logger LOG = LoggerFactory.getLogger(StringProducer.class);

    public static void main(String[] args) {
        StringProducer.producer("chl_test2","{\"rksj\":\"1558177156\",\"latitude\":\"24.000000\",\"imsi\":\"000000000000000\",\"accept_message\":\"\",\"phone_mac\":\"aa-aa-aa-aa-aa-aa\",\"device_mac\":\"bb-bb-bb-bb-bb-bb\",\"message_time\":\"1789098762\",\"filename\":\"wechat_source1_1111119.txt\",\"absolute_filename\":\"/usr/chl/data/filedir_successful/2019-05-18/data/filedir/wechat_source1_1111119.txt\",\"phone\":\"18609765432\",\"device_number\":\"32109231\",\"imei\":\"000000000000000\",\"id\":\"1792d6529e2143fa85717e706403c83c\",\"collect_time\":\"1557305988\",\"send_message\":\"\",\"table\":\"wechat\",\"object_username\":\"judy\",\"longitude\":\"23.000000\",\"username\":\"andiy\"}");
    }

    private static int threadSize = 6;

    /**
     * 生产单条消息 单条推送
     * @param topic
     * @param recourd
     */
    public static void producer(String topic,String recourd){
        Producer<String, String> producer = new Producer<>(KafkaConfig.getInstance().getProducerConfig());
        KeyedMessage<String, String> keyedMessage = new KeyedMessage<>(topic, recourd);
        producer.send(keyedMessage);
        LOG.info("发送数据"+recourd+"到kafka成功");
        producer.close();
    }

    /**
     * 批量推送
     * @param topic
     * @param listRecourd
     */
    public static void producerList(String topic,List<String> listRecourd){
        Producer<String, String> producer = new Producer<>(KafkaConfig.getInstance().getProducerConfig());
        List<KeyedMessage<String, String>> listKeyedMessage= new ArrayList<>();
        listRecourd.forEach(recourd->{
            listKeyedMessage.add(new KeyedMessage<>(topic, recourd));
        });
        producer.send(listKeyedMessage);
        producer.close();
    }

    /**
     * 多线程推送
     * @param topic  kafka  topic
     * @param listMessage 消息
     * @throws Exception
     */
    public void producer(String topic,List<String> listMessage) throws Exception{
        //  int size = listMessage.size();

        List<List<String>> lists = splitList(listMessage, 5);
        int threadNum = lists.size();

        long t1 = System.currentTimeMillis();
        CountDownLatch cdl = new CountDownLatch(threadNum);

        //使用线程池
        ExecutorService executorService = ThreadPoolManager.getInstance().getExecutorService();
        LOG.info("开启 " + threadNum + " 个线程来向  topic " + topic + " 生产数据 . ");

        for (int i = 0; i < threadNum; i++) {
            try {
                executorService.execute(new ProducerTask(topic,lists.get(i)));
            } catch (Exception e) {
                LOG.error("", e);
            }
        }
        cdl.await();
        long t = System.currentTimeMillis() - t1;
        LOG.info(  " 一共耗时  ：" + t + "  毫秒 ... " );
        executorService.shutdown();
    }

    /**
     * 拆分消息集合,计算使用多少个线程执行运算
     * @param mtList
     */
    public static List<List<String>> splitList(List<String> mtList, int splitSize){
        if(mtList == null || mtList.size()==0){
            return null;
        }
        int length = mtList.size();

        // 计算可以分成多少组
        int num = ( length + splitSize - 1 )/splitSize ;
        List<List<String>> spiltList = new ArrayList<>(num);

        for (int i = 0; i < num; i++) {
            // 开始位置
            int fromIndex = i * splitSize;
            // 结束位置
            int toIndex = (i+1) * splitSize < length ? ( i+1 ) * splitSize : length ;
            spiltList.add(mtList.subList(fromIndex,toIndex)) ;
        }
        return  spiltList;
    }

    class ProducerTask implements Runnable{
        private String topic;
        private List<String> listRecourd;
        public ProducerTask( String topic, List<String> listRecourd){
            this.topic = topic;
            this.listRecourd = listRecourd;
        }
        public void run() {
            producerList(topic,listRecourd);
        }
    }

   /* public static void producer(String topic,List<KeyedMessage<String,String>> listMessage) throws Exception{
        int size = listMessage.size();
        int threads = ( ( size - 1  ) / threadSize ) + 1;

        long t1 = System.currentTimeMillis();
        CountDownLatch cdl = new CountDownLatch(threads);
        //使用线程池
        ExecutorService executorService = ThreadPoolManager.getInstance().getExecutorService();
        LOG.info("开启 " + threads + " 个线程来向  topic " + topic + " 生产数据 . ");
      *//*  for( int i = 0 ; i < threads ; i++ ){
            executorService.execute( new StringProducer.ChildProducer( start , end ,  topic , id,  cdl ));
        }*//*
        cdl.await();
        long t = System.currentTimeMillis() - t1;
        LOG.info(  " 一共耗时  ：" + t + "  毫秒 ... " );
        executorService.shutdown();
    }

    static class ChildProducer implements Runnable{

        public ChildProducer( int start , int end ,  String topic , String id,  CountDownLatch cdl ){


        }

        public void run() {

        }
    }
    */

}

七、Spark—kafka2es开发

Cloudera查找对应的maven依赖地址

https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh5_maven_repo_514x.html#concept_flv_nwn_yk

SparkStreaming+Kafka的两种模式receiver模式和Direct模式

Sparkstreming + kafka receiver模式理解

kafka的receiver模式

receiver模式理解
在SparkStreaming程序运行起来后，Executor中会有receiver tasks接收kafka推送过来的数据。数据会被持久化，默认级别为MEMORY_AND_DISK_SER_2,这个级别也可以修改。receiver task对接收过来的数据进行存储和备份，这个过程会有节点之间的数据传输。备份完成后去zookeeper中更新消费偏移量，然后向Driver中的receiver tracker汇报数据的位置。最后Driver根据数据本地化将task分发到不同节点上执行。

receiver模式中存在的问题
当Driver进程挂掉后，Driver下的Executor都会被杀掉，当更新完zookeeper消费偏移量的时候，Driver如果挂掉了，就会存在找不到数据的问题，相当于丢失数据。

如何解决这个问题？
开启WAL(write ahead log)预写日志机制,在接受过来数据备份到其他节点的时候，同时备份到HDFS上一份（我们需要将接收来的数据的持久化级别降级到MEMORY_AND_DISK），这样就能保证数据的安全性。不过，因为写HDFS比较消耗性能，要在备份完数据之后才能进行更新zookeeper以及汇报位置等，这样会增加job的执行时间，这样对于任务的执行提高了延迟度。

注意
1）开启WAL之后，接受数据级别要降级，有效率问题
2）开启WAL要checkpoint
3）开启WAL(write ahead log),往HDFS中备份一份数据

Sparkstreming + kafka receiver模式理解

kafka的direct模式

简化数据处理流程
自己定义offset存储，保证数据0丢失，但是会存在重复消费问题。（解决消费等幂问题）
不用接收数据，自己去kafka中拉取

1、spark下的pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata2</artifactId>
        <groupId>com.hsiehchou</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_spark</artifactId>

    <name>xz_bigdata_spark</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <spark.version>1.6.0</spark.version>
    </properties>

    <dependencies>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_resources</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_es</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_redis</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_hbase</artifactId>
            <version>1.0-SNAPSHOT</version>
            <exclusions>
                <exclusion>
                    <artifactId>servlet-api</artifactId>
                    <groupId>javax.servlet</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>gson</artifactId>
                    <groupId>com.google.code.gson</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.10</artifactId>
            <version>${spark.version}-${cdh.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.10</artifactId>
            <version>${spark.version}-${cdh.version}</version>
            <exclusions>
                <exclusion>
                    <artifactId>httpcore</artifactId>
                    <groupId>org.apache.httpcomponents</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>httpclient</artifactId>
                    <groupId>org.apache.httpcomponents</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>slf4j-api</artifactId>
                    <groupId>org.slf4j</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>gson</artifactId>
                    <groupId>com.google.code.gson</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.10</artifactId>
            <version>${spark.version}-${cdh.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka_2.10</artifactId>
            <version>${spark.version}-${cdh.version}</version>
        </dependency>

        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch-spark-13_2.10</artifactId>
            <version>6.2.3</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <version>2.15.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

            <plugin><!--打包依赖的jar包-->
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <configuration>
                    <outputDirectory>${project.build.directory}/lib</outputDirectory>
                    <excludeTransitive>false</excludeTransitive> <!-- 表示是否不包含间接依赖的包 -->
                    <stripVersion>false</stripVersion> <!-- 去除版本信息 -->
                </configuration>

                <executions>
                    <execution>
                        <id>copy-dependencies</id>
                        <phase>package</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                        <configuration>
                            <!-- 拷贝项目依赖包到lib/目录下 -->
                            <outputDirectory>${project.build.directory}/jars</outputDirectory>
                            <excludeTransitive>false</excludeTransitive>
                            <stripVersion>false</stripVersion>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

2、spark中的文件结构

spark中的文件结构

让IDEA能新建scala.class

点击”+”号，选择Scala SDK，点击Browse，选择本地下载的scala-sdk-2.10.4

3、xz_bigdata_spark/spark/common

SparkContextFactory.scala

package com.hsiehchou.spark.common

import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{Accumulator, SparkContext}

object SparkContextFactory {

  def newSparkBatchContext(appName:String = "sparkBatch") : SparkContext = {
    val sparkConf = SparkConfFactory.newSparkBatchConf(appName)
    new SparkContext(sparkConf)
  }

  def newSparkLocalBatchContext(appName:String = "sparkLocalBatch" , threads : Int = 2) : SparkContext = {
    val sparkConf = SparkConfFactory.newSparkLoalConf(appName, threads)
    sparkConf.set("","")
    new SparkContext(sparkConf)
  }

  def getAccumulator(appName:String = "sparkBatch") : Accumulator[Int] = {
    val sparkConf = SparkConfFactory.newSparkBatchConf(appName)
    val accumulator: Accumulator[Int] = new SparkContext(sparkConf).accumulator(0,"")
    accumulator
  }

  /**
    * 创建本地流streamingContext
    * @param appName             appName
    * @param batchInterval      多少秒读取一次
    * @param threads            开启多少个线程
    * @return
    */
  def newSparkLocalStreamingContext(appName:String = "sparkStreaming" ,
                                    batchInterval:Long = 30L ,
                                    threads : Int = 4) : StreamingContext = {
    val sparkConf =  SparkConfFactory.newSparkLocalConf(appName, threads)
    // sparkConf.set("spark.streaming.receiver.maxRate","10000")
    sparkConf.set("spark.streaming.kafka.maxRatePerPartition","1")
    new StreamingContext(sparkConf, Seconds(batchInterval))
  }

  /**
    * 创建集群模式streamingContext
    * 这里不设置线程数，在submit中指定
    * @param appName
    * @param batchInterval
    * @return
    */
  def newSparkStreamingContext(appName:String = "sparkStreaming" , batchInterval:Long = 30L) : StreamingContext = {
    val sparkConf = SparkConfFactory.newSparkStreamingConf(appName)
    new StreamingContext(sparkConf, Seconds(batchInterval))
  }

  def startSparkStreaming(ssc:StreamingContext){
    ssc.start()
      ssc.awaitTermination()
      ssc.stop()
  }
}

convert/DataConvert.scala

package com.hsiehchou.spark.common.convert

import java.util

import com.hsiehchou.common.config.ConfigUtil
import org.apache.spark.Logging

import scala.collection.JavaConversions._

/**
  * 数据类型转换
  */
object DataConvert extends Serializable with Logging {

  val fieldMappingPath = "es/mapping/fieldmapping.properties"

  private val typeFieldMap: util.HashMap[String, util.HashMap[String, String]] = getEsFieldtypeMap()

  /**
    * 将Map<String,String>转化为Map<String,Object>
    */
  def strMap2esObjectMap(map:util.Map[String,String]):util.Map[String,Object] ={

    //获取配置文件中的数据类型
    val dataType = map.get("table")

    //获取配置文件中的数据类型的 字段类型
    val fieldMap = typeFieldMap.get(dataType)

    //获取数据类型的所有字段，配置文件里的字段
    val keySet = fieldMap.keySet()

    //var objectMap:util.HashMap[String,Object] = new util.HashMap[String,Object]()
    var objectMap = new java.util.HashMap[String, Object]()

    //数据里的字段
    val set = map.keySet().iterator()

    try {
      //遍历真实数据的所有字段
      while (set.hasNext()) {
        val key = set.next()
        var dataType:String = "string"

        //如果在配置文件中的key包含真实数据的key
        if (keySet.contains(key)) {
          //则获取真实数据字段的数据类型
          dataType = fieldMap.get(key)
        }
        dataType match {
          case "long" => objectMap = BaseDataConvert.mapString2Long(map, key, objectMap)
          case "string" => objectMap = BaseDataConvert.mapString2String(map, key, objectMap)
          case "double" => objectMap = BaseDataConvert.mapString2Double(map, key, objectMap)
          case _ => objectMap = BaseDataConvert.mapString2String(map, key, objectMap)
        }
      }
    }catch {
      case e: Exception => logInfo("转换异常", e)
    }
    println("转换后" + objectMap)
    objectMap
  }

  /**
    * 读取 "es/mapping/fieldmapping.properties 配置文件
    * 主要作用是将 真实数据 根据配置来作数据类型转换 转换为和ES mapping结构保持一致
    * @return
    */
  def getEsFieldtypeMap(): util.HashMap[String, util.HashMap[String, String]] = {

    // ["wechat":["phone_mac":"string","latitude":"long"]]

    //定义返回Map
    val mapMap = new util.HashMap[String, util.HashMap[String, String]]
    val properties = ConfigUtil.getInstance().getProperties(fieldMappingPath)
    val tables = properties.get("tables").toString.split(",")
    val tableFields = properties.keySet()

    tables.foreach(table => {
      val map = new util.HashMap[String, String]()
      tableFields.foreach(tableField => {
        if (tableField.toString.startsWith(table)) {
          val key = tableField.toString.split("\\.")(1)
          val value = properties.get(tableField).toString
          map.put(key, value)
        }
      })
      mapMap.put(table, map)
    })
    mapMap
  }
}

scala中的scala文件结构

4、org/apache/spark/streaming/kafka/KafkaManager.scala

构建Kafka时用到，KafkaCluster在org.apache.spark.streaming.kafka下面，而且只能在spark里面使用，这时候我们就可以新建相同的目录结构，就可以引用了，如下图所示：

为什么要新建org.apache.spark.streaming.kafka

package org.apache.spark.streaming.kafka

import com.alibaba.fastjson.TypeReference
import kafka.common.TopicAndPartition
import kafka.message.MessageAndMetadata
import kafka.serializer.{Decoder, StringDecoder}
import org.apache.spark.Logging
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.dstream.{DStream, InputDStream}

import scala.reflect.ClassTag

/**
  * 包名说明 ：KafkaCluster是私有类，只能在spark包中使用，
  *           所以包名保持和 KafkaCluster 一致才能调用
  * @param kafkaParams
  * @param autoUpdateoffset
  */
class KafkaManager(val kafkaParams:Map[String, String],
                   val autoUpdateoffset:Boolean =true) extends Serializable with Logging {

  //构造一个KafkaCluster
  @transient
  private var cluster = new KafkaCluster(kafkaParams)

  //定义一个单例
  def kc(): KafkaCluster = {
    if (cluster == null) {
      cluster = new KafkaCluster(kafkaParams)
    }
    cluster
  }

  /**
    * 泛型流读取器
    * @param ssc
    * @param topics kafka topics,多个topic按","分割
    * @tparam K  泛型 K
    * @tparam V  泛型 V
    * @tparam KD scala泛型 KD <: Decoder[K] 说明KD 的类型必须是Decoder[K]的子类型  上下界
    * @tparam VD scala泛型 VD <: Decoder[V] 说明VD 的类型必须是Decoder[V]的子类型  上下界
    * @return
    */
  def createDirectStream[K: ClassTag, V: ClassTag,
  KD <: Decoder[K] : ClassTag,
  VD <: Decoder[V] : ClassTag](ssc: StreamingContext, topics: Set[String]): InputDStream[(K, V)] = {

    //获取消费者组ID
    //val groupId = "test"
    val groupId = kafkaParams.get("group.id").getOrElse("default")

    // 在zookeeper上读取offsets前先根据实际情况更新offsets
    setOrUpdateOffsets(topics, groupId)

    //把所有的offsets处理完成，就可以从zookeeper上读取offset开始消费message
    val messages = {
      //获取kafka分区信息  为了打印信息
      val partitionsE = kc.getPartitions(topics)
      require(partitionsE.isRight, s"获取 kafka topic ${topics}`s partition 失败。")
      val partitions = partitionsE.right.get
      println("打印分区信息")
      partitions.foreach(println(_))

      //获取分区的offset
      val consumerOffsetsE = kc.getConsumerOffsets(groupId, partitions)
      require(consumerOffsetsE.isRight, s"获取 kafka topic ${topics}`s consumer offsets 失败。")
      val consumerOffsets = consumerOffsetsE.right.get
      println("打印消费者分区偏移信息")
      consumerOffsets.foreach(println(_))
      //读取数据
      KafkaUtils.createDirectStream[K, V, KD, VD, (K, V)](
        ssc, kafkaParams, consumerOffsets, (mmd: MessageAndMetadata[K, V]) => (mmd.key, mmd.message))
    }

    if (autoUpdateoffset) {
      //更新offset
      messages.foreachRDD(rdd => {
        logInfo("RDD 消费成功，开始更新zookeeper上的偏移")
        updateZKOffsets(rdd)
      })
    }
    messages
  }


  /**
    * 创建数据流前，根据实际消费情况更新消费offsets
    * @param topics
    * @param groupId
    */
  private def setOrUpdateOffsets(topics: Set[String], groupId: String): Unit = {
    topics.foreach(topic => {
      //先获取Kafka offset信息  Kafka partions的节点信息
      //获取kafka本身的偏移量, Either类型可以认为就是封装了2种信息
      val partitionsE = kc.getPartitions(Set(topic))
      logInfo(partitionsE + "")
      //require(partitionsE.isRight, "获取partition失败")
      require(partitionsE.isRight, s"获取 kafka topic ${topic}`s partition 失败。")
      println("partitionsE=" + partitionsE)
      val partitions = partitionsE.right.get
      println("打印分区信息")
      partitions.foreach(println(_))

      //获取kafka partions最早的offsets
      val earliestLeader = kc.getEarliestLeaderOffsets(partitions)
      require(earliestLeader.isRight, "获取earliestLeader失败")
      val earliestLeaderOffsets = earliestLeader.right.get
      println("kafka最早的消息偏移量")
      earliestLeaderOffsets.foreach(println(_))

      //获取kafka最末尾的offsets
      val latestLeader = kc.getLatestLeaderOffsets(partitions)
      //require(latestLeader.isRight, "获取latestLeader失败")
      val latestLeaderOffsets = latestLeader.right.get
      println("kafka最末尾的消息偏移量")
      latestLeaderOffsets.foreach(println(_))

      //获取消费者的offsets
      val consumerOffsetsE = kc.getConsumerOffsets(groupId, partitions)

      //判断消费者是否消费过,消费者offset存在
      if (consumerOffsetsE.isRight) {
        /**
          * 如果zk上保存的offsets已经过时了，即kafka的定时清理策略已经将包含该offsets的文件删除。
          * 针对这种情况，只要判断一下zk上的consumerOffsets和earliestLeaderOffsets的大小，
          * 如果consumerOffsets比earliestLeaderOffsets还小的话，说明consumerOffsets已过时,
          * 这时把consumerOffsets更新为earliestLeaderOffsets
          */
        //如果消费过，直接取过来的kafka消费，，earliestLeader 存在
        if (earliestLeader.isRight) {
          //获取到最早的offset  也就是最小的offset
          require(earliestLeader.isRight, "获取earliestLeader失败")
          val earliestLeaderOffsets = earliestLeader.right.get
          //获取消费者组的offset
          val consumerOffsets = consumerOffsetsE.right.get
          // 将 consumerOffsets 和 earliestLeaderOffsets 的offsets 做比较
          // 可能只是存在部分分区consumerOffsets过时，所以只更新过时分区的consumerOffsets为earliestLeaderOffsets
          var offsets: Map[TopicAndPartition, Long] = Map()

          consumerOffsets.foreach({ case (tp, n) =>
            val earliestLeaderOffset = earliestLeaderOffsets(tp).offset
            //如果消費者的偏移小于 kafka中最早的offset,那么，將最早的offset更新到zk
            if (n < earliestLeaderOffset) {
              logWarning("consumer group:" + groupId + ",topic:" + tp.topic + ",partition:" + tp.partition +
                " offsets已经过时，更新为" + earliestLeaderOffset)
              offsets += (tp -> earliestLeaderOffset)
            }
          })
          //设置offsets
          setOffsets(groupId, offsets)
        }
      } else {
        //如果没有消费过，那么就去取kafka获取earliestLeader写到zk中
        // 消费者还没有消费过  也就是zookeeper中还没有消费者的信息
        if (earliestLeader.isLeft)
          logError(s"${topic} hasConsumed but earliestLeaderOffsets is null。")

        //看是从头消费还是从末开始消费  smallest表示从头开始消费
        val reset = kafkaParams.get("auto.offset.reset").map(_.toLowerCase).getOrElse("smallest")

        //往zk中去写，构建消费者 偏移
        var leaderOffsets: Map[TopicAndPartition, Long] = Map.empty

        //从头消费
        if (reset.equals("smallest")) {
          //分为 存在 和 不存在 最早的消费记录 两种情况
          //如果kafka 最小偏移存在，则将消费者偏移设置为和kafka偏移一样
          if (earliestLeader.isRight) {
            leaderOffsets = earliestLeader.right.get.map {
              case (tp, offset) => (tp, offset.offset)
            }
          } else {
            //如果不存在，则从新构建偏移全部为0 offsets
            leaderOffsets = partitions.map(tp => (tp, 0L)).toMap
          }
        } else {
          //直接获取最新的offset
          leaderOffsets = kc.getLatestLeaderOffsets(partitions).right.get.map {
            case (tp, offset) => (tp, offset.offset)
          }
        }
        //设置offsets 写到zk中
        setOffsets(groupId, leaderOffsets)
      }
    })
  }

  /**
    * 设置消费者组的offsets
    * @param groupId
    * @param offsets
    */
  private def setOffsets(groupId: String, offsets: Map[TopicAndPartition, Long]): Unit = {
    if (offsets.nonEmpty) {
      //更新offset
      val o = kc.setConsumerOffsets(groupId, offsets)
      logInfo(s"更新zookeeper中消费组为：${groupId} 的 topic offset信息为： ${offsets}")
      if (o.isLeft) {
        logError(s"Error updating the offset to Kafka cluster: ${o.left.get}")
      }
    }
  }

  /**
    * 通过spark的RDD 更新zookeeper上的消费offsets
    * @param rdd
    */
  def updateZKOffsets[K: ClassTag, V: ClassTag](rdd: RDD[(K, V)]) : Unit = {
    //获取消费者组
    val groupId = kafkaParams.get("group.id").getOrElse("default")
    //spark使用kafka低阶API进行消费的时候,每个partion的offset是保存在 spark的RDD中，所以这里可以直接在
    //RDD的 HasOffsetRanges 中获取倒offsets信息。因为这个信息spark不会把则个信息存储到zookeeper中，所以
    //我们需要自己实现将这部分offsets信息存储到zookeeper中
    val offsetsList = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
    //打印出spark中保存的offsets信息
    offsetsList.foreach(x=>{
      println("获取spark 中的偏移信息"+x)
    })

    for (offsets <- offsetsList) {
      //根据topic和partition 构建topicAndPartition
      val topicAndPartition = TopicAndPartition(offsets.topic, offsets.partition)
      logInfo("将SPARK中的 偏移信息 存到zookeeper中")
      //将消费者组的offsets更新到zookeeper中
      setOffsets(groupId, Map((topicAndPartition, offsets.untilOffset)))
    }
  }

  //(null,{"rksj":"1558178497","latitude":"24.000000","imsi":"000000000000000"})
  //读取kafka流，并将json数据转为map
  def createJsonToJMapObjectDirectStreamWithOffset(ssc:StreamingContext, topicsSet:Set[String]): DStream[java.util.Map[String,Object]] = {
    //一个转换器
    val converter = {json:String =>
      println(json)
      var res : java.util.Map[String,Object] = null
      try {
        //JSON转map的操作
        res = com.alibaba.fastjson.JSON.parseObject(json,
          new TypeReference[java.util.Map[String, Object]]() {})
      } catch {
        case e: Exception => logError(s"解析topic ${topicsSet}, 的记录 ${json} 失败。", e)
      }
      res
    }
    createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
  }

  /**
    * 根据converter创建流数据
    * @param ssc
    * @param topicsSet
    * @param converter
    * @tparam T
    * @return
    */
  def createDirectStreamWithOffset[T:ClassTag](ssc:StreamingContext,
                                               topicsSet:Set[String], converter:String => T): DStream[T] = {
    createDirectStream[String, String, StringDecoder, StringDecoder](ssc, topicsSet)
      .map(pair =>converter(pair._2))
  }

  def createJsonToJMapDirectStreamWithOffset(ssc:StreamingContext,
                                             topicsSet:Set[String]): DStream[java.util.Map[String,String]] = {
    val converter = {json:String =>
      var res : java.util.Map[String,String] = null
      try {
        res = com.alibaba.fastjson.JSON.parseObject(json, new TypeReference[java.util.Map[String, String]]() {})
      } catch {
        case e: Exception => logError(s"解析topic ${topicsSet}, 的记录 ${json} 失败。", e)
      }
      res
    }
    createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
  }

  /*
    /**
      * @param ssc
      * @param topicsSet
      * @return
      */
    def createJsonToJavaBeanDirectStreamWithOffset(ssc:StreamingContext ,
                                                   topicsSet:Set[String]): DStream[Object] = {
      val converter = {json:String =>
        var res : Object = null
        try {
          res = com.alibaba.fastjson.JSON.parseObject(json, new TypeReference[Object]() {})
        } catch {
          case e: Exception => logError(s"解析topic ${topicsSet}, 的记录 ${json} 失败。", e)
        }
        res
      }
      createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
    }
  */

  /*
    def createStringDirectStreamWithOffset(ssc:StreamingContext ,
                                           topicsSet:Set[String]): DStream[String] = {
      val converter = {json:String =>
        json
      }
      createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
    }
  */

  /**
    * 读取JSON的流 并将JSON流 转为MAP流  并且这个流支持RDD向zookeeper中记录消费信息
    * @param ssc   spark ssc
    * @param topicsSet topic 集合 支持从多个kafka topic同时读取数据
    * @return  DStream[java.util.Map[String,String
    */
  def createJsonToJMapStringDirectStreamWithOffset(ssc:StreamingContext , topicsSet:Set[String]): DStream[java.util.Map[String,String]] = {
    val converter = {json:String =>
      var res : java.util.Map[String,String] = null
      try {
        res = com.alibaba.fastjson.JSON.parseObject(json, new TypeReference[java.util.Map[String, String]]() {})
      } catch {
        case e: Exception => logError(s"解析topic ${topicsSet}, 的记录 ${json} 失败。", e)
      }
      res
    }
    createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
  }

  /**
    * 读取JSON的流 并将JSON流 转为MAP流  并且这个流支持RDD向zookeeper中记录消费信息
    * @param ssc   spark ssc
    * @param topicsSet topic 集合 支持从多个kafka topic同时读取数据
    * @return  DStream[java.util.Map[String,String
    */
  def createJsonToJMapStringDirectStreamWithoutOffset(ssc:StreamingContext , topicsSet:Set[String]): DStream[java.util.Map[String,String]] = {
    val converter = {json:String =>
      var res : java.util.Map[String,String] = null
      try {
        res = com.alibaba.fastjson.JSON.parseObject(json, new TypeReference[java.util.Map[String, String]]() {})
      } catch {
        case e: Exception => logError(s"解析topic ${topicsSet}, 的记录 ${json} 失败。", e)
      }
      res
    }
    createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
  }

}

object KafkaManager extends Logging{

  def apply(broker:String, groupId:String = "default",
            numFetcher:Int = 1, offset:String = "smallest",
            autoUpdateoffset:Boolean = true): KafkaManager ={
    new KafkaManager(
      createKafkaParam(broker, groupId, numFetcher, offset),
      autoUpdateoffset)
  }

  def createKafkaParam(broker:String, groupId:String = "default",
                       numFetcher:Int = 1, offset:String = "smallest"): Map[String, String] ={
    //创建 stream 时使用的 topic 名字集合
    Map[String, String](
      "metadata.broker.list" -> broker,
      "auto.offset.reset" -> offset,
      "group.id" -> groupId,
      "num.consumer.fetchers" -> numFetcher.toString)
  }
}

5、resources/log4j.properties

### 设置###
log4j.rootLogger = error,stdout,D,E

### 输出信息到控制抬 ###
log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target = System.out
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern = [%-5p] %d{yyyy-MM-dd HH:mm:ss,SSS} method:%l%n%m%n

### 输出DEBUG 级别以上的日志到=E://logs/error.log ###
log4j.appender.D = org.apache.log4j.DailyRollingFileAppender
log4j.appender.D.File = E://logs/log.log
log4j.appender.D.Append = true
log4j.appender.D.Threshold = stdout 
log4j.appender.D.layout = org.apache.log4j.PatternLayout
log4j.appender.D.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss}  [ %t:%r ] - [ %p ]  %m%n

###输出ERROR 级别以上的日志到=E://logs/error.log ###
log4j.appender.E = org.apache.log4j.DailyRollingFileAppender
log4j.appender.E.File =E://logs/error.log 
log4j.appender.E.Append = true
log4j.appender.E.Threshold = ERROR 
log4j.appender.E.layout = org.apache.log4j.PatternLayout
log4j.appender.E.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss}  [ %t:%r ] - [ %p ]  %m%n

6、xz_bigdata_spark/spark/streaming/kafka

Spark_Es_ConfigUtil.scala

package com.hsiehchou.spark.streaming.kafka

import org.apache.spark.Logging

object Spark_Es_ConfigUtil extends Serializable with Logging{

 // val ES_NODES = "es.cluster.nodes"
 // val ES_PORT = "es.cluster.http.port"
 // val ES_CLUSTERNAME = "es.cluster.name"

  val ES_NODES = "es.nodes"
  val ES_PORT = "es.port"
  val ES_CLUSTERNAME = "es.clustername"

  def getEsParam(id_field : String): Map[String,String] ={
    Map[String ,String]("es.mapping.id" -> id_field,
      ES_NODES -> "hadoop1,hadoop2,hadoop3",
      //ES_NODES -> "hadoop1",
      ES_PORT -> "9200",
      ES_CLUSTERNAME -> "xz_es",
      "es.batch.size.entries"->"6000",
      /*   "es.nodes.wan.only"->"true",*/
      "es.nodes.discovery"->"true",
      "es.batch.size.bytes"->"300000000",
      "es.batch.write.refresh"->"false"
    )
  }
}

Spark_Kafka_ConfigUtil.scala

package com.hsiehchou.spark.streaming.kafka

import org.apache.spark.Logging

object Spark_Kafka_ConfigUtil extends Serializable with Logging{

  def getKafkaParam(brokerList:String,groupId : String): Map[String,String]={
    val kafkaParam=Map[String,String](
      "metadata.broker.list" -> brokerList,
      "auto.offset.reset" -> "smallest",
      "group.id" -> groupId,
      "refresh.leader.backoff.ms" -> "1000",
      "num.consumer.fetchers" -> "8")
    kafkaParam
  }
}

7、kafka2es

Kafka2esJob.scala

package com.hsiehchou.spark.streaming.kafka.kafka2es

import com.hsiehchou.es.admin.AdminUtil
import com.hsiehchou.es.client.ESClientUtils
import com.hsiehchou.spark.common.convert.DataConvert
import com.hsiehchou.spark.streaming.kafka.Spark_Es_ConfigUtil
import org.apache.spark.Logging
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming.dstream.DStream
import org.elasticsearch.client.transport.TransportClient
import org.elasticsearch.spark.rdd.EsSpark

object Kafka2esJob extends Serializable with Logging {

  /**
    * 按日期分组写入ES
    * @param dataType
    * @param typeDS
    */
  def insertData2EsBydate(dataType:String,typeDS:DStream[java.util.Map[String,String]]): Unit ={

    //通过 dataType + 日期来动态创建 分索引。 日期格式为 yyyyMMdd
    //主要就是时间混杂  通过时间分组就行了 groupby       filter
    //index前缀  通过对日期进行过滤 避免shuffle操作
    val index_prefix = dataType
    val client: TransportClient = ESClientUtils.getClient
    typeDS.foreachRDD(rdd=>{

      //如果时少量数据可以这样处理
      //rdd.groupBy()
      //吧所有的日期拿到
      val days = getDays(dataType,rdd)

      //我们使用日期对数据进行过滤  par时scala并发集合
      days.par.foreach(day=>{

        //通过前缀+日期组成一个动态的索引   比例  qq + "_" + "20190508"
        val index = index_prefix + "_" + day

        //判断索引是否存在
        val bool = AdminUtil.indexExists(client,index)
        if(!bool){
          //如果不存在，创建
          val mappingPath = s"es/mapping/${index_prefix}.json"
          AdminUtil.buildIndexAndTypes(index, index, mappingPath, 5, 1)
        }
        //构建RDD，数据类型 某一天的数据RDD
        //返回一个map[String,obJECT] 的RDD   //就是一个单一类型  单一天数的RDD
        val tableRDD = rdd.filter(map=>{
          day.equals(map.get("index_date"))
        }).map(x=>{
          //将map[String,String] 转为map[String,obJECT]
          DataConvert.strMap2esObjectMap(x)
        })
        EsSpark.saveToEs(tableRDD,index+ "/"+index,Spark_Es_ConfigUtil.getEsParam("id"))
      })
    })
    //日期为后
  }

  /**
    * 获取日期的集合
    * @param dataType
    * @param rdd
    * @return
    */
  def getDays(dataType:String,rdd:RDD[java.util.Map[String,String]]): Array[String] ={
    //对日期去重，然后集中到driver
    return  rdd.map(x=>{x.get("index_date")}).distinct().collect()
  }

  /**
    * 将RDD转换之后写入ES
    * @param dataType
    * @param typeRDD
    */
  def insertData2Es(dataType:String,typeRDD:RDD[java.util.Map[String,String]]): Unit = {
    val index = dataType
    val esRDD =  typeRDD.map(x=>{
      DataConvert.strMap2esObjectMap(x)
    })
    EsSpark.saveToEs(esRDD,index+ "/"+index,Spark_Es_ConfigUtil.getEsParam("id"))
    println("写入ES" + esRDD.count() + "条数据成功")
  }

  /**
    * 将RDD转换后写入ES
    * @param dataType
    * @param typeDS
    */
  def insertData2Es(dataType:String, typeDS:DStream[java.util.Map[String, String]]): Unit = {
    val index = dataType
    typeDS.foreachRDD(rdd=>{
      val esRDD = rdd.map(x=>{
        DataConvert.strMap2esObjectMap(x)
      })
      EsSpark.saveToEs(rdd, dataType+"/"+dataType, Spark_Es_ConfigUtil.getEsParam("id"))
      println("写入ES" + esRDD.count() + "条数据成功")
    })
  }
}

Kafka2esStreaming.scala

package com.hsiehchou.spark.streaming.kafka.kafka2es

import java.util
import java.util.Properties

import com.hsiehchou.common.config.ConfigUtil
import com.hsiehchou.common.project.datatype.DataTypeProperties
import com.hsiehchou.common.time.TimeTranstationUtils
import com.hsiehchou.spark.common.SparkContextFactory
import com.hsiehchou.spark.streaming.kafka.Spark_Kafka_ConfigUtil
import org.apache.commons.lang3.StringUtils
import org.apache.spark.Logging
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.streaming.kafka.KafkaManager

import scala.collection.JavaConversions._

object Kafka2esStreaming extends Serializable with Logging {
  //获取数据类型
  private val dataTypes: util.Set[String] = DataTypeProperties.dataTypeMap.keySet()

  val kafkaConfig: Properties = ConfigUtil.getInstance().getProperties("kafka/kafka-server-config.properties")

  def main(args: Array[String]): Unit = {

    //val topics = "chl_test7".split(",")
    val topics = args(1).split(",")

    //   val ssc = SparkConfFactory.newSparkLocalStreamingContext("XZ_kafka2es", java.lang.Long.valueOf(10),1)
    val ssc = SparkContextFactory.newSparkStreamingContext("Kafka2esStreaming", java.lang.Long.valueOf(10))

    //构建kafkaManager
    val kafkaManager = new KafkaManager(
      Spark_Kafka_ConfigUtil.getKafkaParam(kafkaConfig.getProperty("metadata.broker.list"), "XZ3")
    )

    //使用kafkaManager创建DStreaming流
    val kafkaDS = kafkaManager.createJsonToJMapStringDirectStreamWithOffset(ssc, topics.toSet)
      //添加一个日期分组字段
      //如果数据其他的转换，可以先在这里进行统一转换
      .map(map=>{
      map.put("index_date",TimeTranstationUtils.Date2yyyyMMddHHmmss(java.lang.Long.valueOf(map.get("collect_time")+"000")))
      map
    }).persist(StorageLevel.MEMORY_AND_DISK)

    //使用par并发集合可以是任务并发执行。在资源充足的情况下
    dataTypes.foreach(datatype=>{
      //过滤出单个类别的数据种类
      val tableDS = kafkaDS.filter(x=>{datatype.equals(x.get("table"))})
      Kafka2esJob.insertData2Es(datatype,tableDS)
    })

    ssc.start()
    ssc.awaitTermination()
  }

  /**
    * 启动参数检查
    * @param args
    */
  def sparkParamCheck(args: Array[String]): Unit ={
    if (args.length == 4) {
      if (StringUtils.isBlank(args(1))) {
        logInfo("kafka集群地址不能为空")
        logInfo("kafka集群地址格式为     主机1名：9092,主机2名：9092,主机3名：9092...")
        logInfo("格式为     主机1名：9092,主机2名：9092,主机3名：9092...")
        System.exit(-1)
      }
      if (StringUtils.isBlank(args(2))) {
        logInfo("kafka topic1不能为空")
        System.exit(-1)
      }
      if (StringUtils.isBlank(args(3))) {
        logInfo("kafka topic2不能为空")
        System.exit(-1)
      }
    }else{
      logError("启动参数个数错误")
    }
  }

  def startJob(ds:DStream[String]): Unit ={
  }
}

java/com/hsiehchou/spark/common/convert/BaseDataConvert.java

package com.hsiehchou.spark.common.convert;

import org.apache.commons.lang.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.HashMap;
import java.util.Map;

public class BaseDataConvert {

    private static final Logger LOG = LoggerFactory.getLogger(BaseDataConvert.class);

    public static HashMap<String,Object> mapString2Long(Map<String,String> map, String key, HashMap<String,Object> objectMap) {
        String logouttime = map.get(key);
        if (StringUtils.isNotBlank(logouttime)) {
            objectMap.put(key, Long.valueOf(logouttime));
        } else {
            objectMap.put(key, 0L);
        }
        return objectMap;
    }

    public static HashMap<String,Object> mapString2Double(Map<String,String> map, String key, HashMap<String,Object> objectMap) {
        String logouttime = map.get(key);
        if (StringUtils.isNotBlank(logouttime)) {
            objectMap.put(key, Double.valueOf(logouttime));
        } else {
            objectMap.put(key, 0.000000);
        }
        return objectMap;
    }

    public static HashMap<String,Object> mapString2String(Map<String,String> map, String key, HashMap<String,Object> objectMap) {
        String logouttime = map.get(key);
        if (StringUtils.isNotBlank(logouttime)) {
            objectMap.put(key, logouttime);
        } else {
            objectMap.put(key, "");
        }
        return objectMap;
    }
}

8、ES动态索引创建

/**
    * 按日期分组写入ES
    * @param dataType
    * @param typeDS
    */
  def insertData2EsBydate(dataType:String,typeDS:DStream[java.util.Map[String,String]]): Unit ={

    //通过 dataType + 日期来动态创建 分索引。 日期格式为 yyyyMMdd
    //主要就是时间混杂  通过时间分组就行了 groupby       filter
    //index前缀  通过对日期进行过滤 避免shuffle操作
    val index_prefix = dataType
    val client: TransportClient = ESClientUtils.getClient
    typeDS.foreachRDD(rdd=>{

      //如果时少量数据可以这样处理
      //rdd.groupBy()
      //吧所有的日期拿到
      val days = getDays(dataType,rdd)

      //我们使用日期对数据进行过滤  par时scala并发集合
      days.par.foreach(day=>{

        //通过前缀+日期组成一个动态的索引   比例  qq + "_" + "20190508"
        val index = index_prefix + "_" + day

        //判断索引是否存在
        val bool = AdminUtil.indexExists(client,index)
        if(!bool){
          //如果不存在，创建
          val mappingPath = s"es/mapping/${index_prefix}.json"
          AdminUtil.buildIndexAndTypes(index, index, mappingPath, 5, 1)
        }
        //构建RDD，数据类型 某一天的数据RDD
        //返回一个map[String,obJECT] 的RDD   //就是一个单一类型  单一天数的RDD
        val tableRDD = rdd.filter(map=>{
          day.equals(map.get("index_date"))
        }).map(x=>{
          //将map[String,String] 转为map[String,obJECT]
          DataConvert.strMap2esObjectMap(x)
        })
        EsSpark.saveToEs(tableRDD,index+ "/"+index,Spark_Es_ConfigUtil.getEsParam("id"))
      })
    })
    //日期为后
  }

xz_bigdata_es下一节展示代码
入ES使用动态索引

9、CDH的java配置和Elasticsearch的配置

cdh的jdk设置
/usr/local/jdk1.8

kafka配置

Default Number of Partitions：num.partitions 8

Offset Commit Topic Number of Partitions：180天

Log Compaction Delete Record Retention Time：log.cleaner.delete.retention.ms 30天

Data Log Roll Hours：log.retention.hours 30天 log.roll.hours 30天

Java Heap Size of Broker：broker_max_heap_size 1吉字节

YARN
容器内存 5g 5g 1g 10g

这里的CDH安装另一篇文章介绍

前提安装好elasticsearch

mkdir /opt/software/elasticsearch/data/

mkdir /opt/software/elasticsearch/logs/

chmod 777 /opt/software/elasticsearch/data/

useradd elasticsearch
passwd elasticsearch

chown -R elasticsearch elasticsearch/

vim /etc/security/limits.conf
添加如下内容:
* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 4096

进入limits.d目录下修改配置文件
vim /etc/security/limits.d/90-nproc.conf

修改如下内容：
soft nproc 4096（修改为此参数，6版本的默认就是4096）

修改配置sysctl.conf
vim /etc/sysctl.conf

添加下面配置：
vm.max_map_count=655360

并执行命令：
sysctl -p

hadoop1的conf配置
elasticsearch.yml

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application

cluster.name: xz_es

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#

node.name: node-1

node.master: true

node.data: true

# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data

path.data: /opt/software/elasticsearch/data

#
# Path to log files:
#
#path.logs: /path/to/logs

path.logs: /opt/software/elasticsearch/logs

#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true

bootstrap.memory_lock: false

bootstrap.system_call_filter: false

#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1

network.host: 192.168.116.201

#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]

discovery.zen.ping.unicast.hosts: ["hadoop1", "hadoop2", "hadoop3"]

#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

jvm.options
修改下
-Xms64m
-Xmx64m

hadoop2的conf配置
elasticsearch.yml

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application

cluster.name: xz_es

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1

node.name: node-2

node.master: false

node.data: true

#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data

path.data: /opt/software/elasticsearch/data

#
# Path to log files:
#
#path.logs: /path/to/logs

path.logs: /opt/software/elasticsearch/logs

#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true

bootstrap.memory_lock: false

bootstrap.system_call_filter: false

#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1

network.host: 192.168.116.202

#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]

discovery.zen.ping.unicast.hosts: ["hadoop1", "hadoop2", "hadoop3"]

#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

jvm.options
修改下
-Xms64m
-Xmx64m

hadoop3的conf配置
elasticsearch.yml

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application

cluster.name: xz_es

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1

node.name: node-3

node.master: false

node.data: true

#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data

path.data: /opt/software/elasticsearch/data

#
# Path to log files:
#
#path.logs: /path/to/logs

path.logs: /opt/software/elasticsearch/logs

#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true

bootstrap.memory_lock: false

bootstrap.system_call_filter: false

#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1

network.host: 192.168.116.203

#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]

discovery.zen.ping.unicast.hosts: ["hadoop1", "hadoop2", "hadoop3"]

#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

jvm.options
修改下
-Xms64m
-Xmx64m

Kibana的conf配置

kibana.yml

# Kibana is served by a back end server. This setting specifies the port to use.

server.port: 5601

# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
#server.host: "localhost"

server.host: "192.168.116.202"

# Enables you to specify a path to mount Kibana at if you are running behind a proxy. This only affects
# the URLs generated by Kibana, your proxy is expected to remove the basePath value before forwarding requests
# to Kibana. This setting cannot end in a slash.
#server.basePath: ""

# The maximum payload size in bytes for incoming server requests.
#server.maxPayloadBytes: 1048576

# The Kibana server's name.  This is used for display purposes.
#server.name: "your-hostname"

# The URL of the Elasticsearch instance to use for all your queries.
#elasticsearch.url: "http://localhost:9200"

elasticsearch.url: "http://192.168.116.201:9200"

运行Elasticsearch
cd /opt/software/elasticsearch
su elasticsearch
bin/elasticsearch &

运行Kibana
cd /opt/software/kibana/
bin/kibana &

10、kafka2es打包到集群执行

打包
使用maven工具点击install

放入集群
将打包完成的jar文件和xz_bigdata_spark-1.0-SNAPSHOT.jar 一起放入/usr/chl/spark7/目录下面

执行
spark-submit --master yarn-cluster --num-executors 1 --driver-memory 500m --executor-memory 1g --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ‘ ‘ ‘,’) --class com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar chl_test7 chl_test7

spark-submit
--master yarn-cluster //集群启动
--num-executors 1 //分配多少个进程
--driver-memory 500m //driver内存
--executor-memory 1g //进程内存
--executor-cores 1 //开多少个核，线程
--jars $(echo /usr/chl/spark8/jars/*.jar | tr ‘ ‘ ‘,’) //加载jar
--class com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar

11、运行截图

kafka2esstreaming截图

Elasticsearch各个节点状况

12、冲突查找快捷键

Ctrl+Alt+Shift+N

八、xz_bigdata_es开发

1、pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata2</artifactId>
        <groupId>com.hsiehchou</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_es</artifactId>

    <name>xz_bigdata_es</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>

    <dependencies>
        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_resources</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>transport</artifactId>
            <version>6.2.3</version>
        </dependency>

        <dependency>
            <groupId>io.searchbox</groupId>
            <artifactId>jest</artifactId>
            <version>6.3.1</version>
        </dependency>
    </dependencies>
</project>

2、admin

AdminUtil.java

package com.hsiehchou.es.admin;

import com.hsiehchou.common.file.FileCommon;
import com.hsiehchou.es.client.ESClientUtils;
import org.elasticsearch.action.admin.indices.create.CreateIndexResponse;
import org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class AdminUtil {
    private static Logger LOG = LoggerFactory.getLogger(AdminUtil.class);

    public static void main(String[] args) throws Exception{
        //创建索引核mapping
        AdminUtil.buildIndexAndTypes("tanslator_test1111","tanslator_test1111", "es/mapping/test.json",3,1);
        //index = 类型+日期
        //查找类  Ctrl+Shift+Alt+N
    }

    /**
     * @param index
     * @param type
     * @param path
     * @param shard
     * @param replication
     * @return
     * @throws Exception
     */
    public static boolean buildIndexAndTypes(String index,String type,String path,int shard,int replication) throws Exception{
        boolean flag ;
        TransportClient client = ESClientUtils.getClient();
        String mappingJson = FileCommon.getAbstractPath(path);

        boolean indices = AdminUtil.createIndices(client, index, shard, replication);
        if(indices){
            LOG.info("创建索引"+ index + "成功");
            flag = MappingUtil.addMapping(client, index, type, mappingJson);
        }
        else{
            LOG.error("创建索引"+ index + "失败");
            flag = false;
        }
        return flag;
    }

    /**
     * @desc 判断需要创建的index是否存在
     * */
    public static boolean indexExists(TransportClient client,String index){
        boolean ifExists = false;
        try {
            System.out.println("client===" + client);
            IndicesExistsResponse existsResponse = client.admin().indices().prepareExists(index).execute().actionGet();
            ifExists = existsResponse.isExists();
        } catch (Exception e) {
            e.printStackTrace();
            LOG.error("判断index是否存在失败...");
            return ifExists;
        }
        return ifExists;
    }

    /**
     * 创建索引
     * @param client
     * @param index
     * @param shard
     * @param replication
     * @return
     */
    public static boolean createIndices(TransportClient client, String index, int shard , int replication){

        if(!indexExists(client,index)) {
            LOG.info("该index不存在，创建...");
            CreateIndexResponse createIndexResponse =null;
            try {
                createIndexResponse = client.admin().indices().prepareCreate(index)
                        .setSettings(Settings.builder()
                                .put("index.number_of_shards", shard)
                                .put("index.number_of_replicas", replication)
                                .put("index.codec", "best_compression")
                                .put("refresh_interval", "30s"))
                        .execute().actionGet();
                return createIndexResponse.isAcknowledged();
            } catch (Exception e) {
                LOG.error(null, e);
                return false;
            }
        }
        LOG.warn("该index " + index + " 已经存在...");
        return false;
    }
}

MappingUtil.java

package com.hsiehchou.es.admin;

import com.alibaba.fastjson.JSON;
import org.elasticsearch.action.admin.indices.mapping.put.PutMappingRequest;
import org.elasticsearch.action.admin.indices.mapping.put.PutMappingResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentFactory;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;

public class MappingUtil {

    private static Logger LOG = LoggerFactory.getLogger(MappingUtil.class);
    //关闭自动添加字段，关闭后索引数据中如果有多余字段不会修改mapping,默认true
    private boolean dynamic = true;

    public static XContentBuilder buildMapping(String tableName) throws IOException {
        XContentBuilder builder = null;
        try {
            builder = XContentFactory.jsonBuilder().startObject()
                    .startObject(tableName)
                    .startObject("_source").field("enabled", true).endObject()
                    .startObject("properties")
                    .startObject("id").field("type", "long").endObject()
                    .startObject("sn").field("type", "text").endObject()
                    .endObject()  
                .endObject()  
                .endObject();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return builder;
    }

    public static boolean addMapping(TransportClient client, String index, String type, String jsonString){
        PutMappingResponse putMappingResponse = null;
        try {
            PutMappingRequest mappingRequest = new PutMappingRequest(index)
                    .type(type).source(JSON.parseObject(jsonString));
            putMappingResponse = client.admin().indices().putMapping(mappingRequest).actionGet();
        } catch (Exception e) {
            LOG.error(null,e);
            e.printStackTrace();
            LOG.error("添加" + type + "的mapping失败....",e);
            return false;
        }
        boolean success = putMappingResponse.isAcknowledged();
        if (success){
            LOG.info("创建" + type + "的mapping成功....");
            return success;
        }
        return success;
    }

    public static void main(String[] args) throws Exception {
        /*String singleConf = ConsulConfigUtil.getSingleConf("es6.1.0/mapping/http");
        int i = singleConf.length() / 2;
        System.out.println(i);*/
    }
}

3、client

ESClientUtils.java

package com.hsiehchou.es.client;

import com.hsiehchou.common.config.ConfigUtil;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.Serializable;
import java.net.InetAddress;
import java.util.Properties;

/**
 * ES 客户端获取
 */
public class ESClientUtils implements Serializable{

    private static Logger LOG = LoggerFactory.getLogger(ESClientUtils.class);
    private volatile static TransportClient esClusterClient;
    private ESClientUtils(){}
    private static Properties properties;
    static {
        properties = ConfigUtil.getInstance().getProperties("es/es_cluster.properties");
    }

    public static TransportClient getClient(){
        System.setProperty("es.set.netty.runtime.available.processors", "false");
        String clusterName = properties.getProperty("es.cluster.name");
        String clusterNodes1 = properties.getProperty("es.cluster.nodes1");
        String clusterNodes2 = properties.getProperty("es.cluster.nodes2");
        String clusterNodes3 = properties.getProperty("es.cluster.nodes3");
        LOG.info("clusterName:"+ clusterName);
        LOG.info("clusterNodes:"+ clusterNodes1);
        LOG.info("clusterNodes:"+ clusterNodes2);
        LOG.info("clusterNodes:"+ clusterNodes3);
        if(esClusterClient==null){
            synchronized (ESClientUtils.class){
                if(esClusterClient==null){
                    try{
                        Settings settings = Settings.builder()
                                .put("cluster.name", clusterName)
                                //.put("searchguard.ssl.transport.enabled", false)
                                //.put("xpack.security.user", "sc_xy_mn_es:xy@66812.com")
                               // .put("transport.type","netty3")
                               // .put("http.type","netty3")
                                .put("client.transport.sniff",true).build();//开启自动嗅探功能
                        esClusterClient = new PreBuiltTransportClient(settings)
                                .addTransportAddress(new TransportAddress(InetAddress.getByName(clusterNodes1), 9300))
                                .addTransportAddress(new TransportAddress(InetAddress.getByName(clusterNodes2), 9300))
                                .addTransportAddress(new TransportAddress(InetAddress.getByName(clusterNodes3), 9300));
                        LOG.info("esClusterClient========" + esClusterClient.listedNodes());
                    }catch (Exception e){
                        LOG.error("获取客户端失败",e);
                    }finally {

                    }
                }
            }
        }
        return esClusterClient;
    }

    public static void main(String[] args) {
        TransportClient client = ESClientUtils.getClient();
        System.out.println(client);
    }
}

4、jest/service

IndexTypeUtil.java

package com.hsiehchou.es.jest.service;

import com.hsiehchou.common.config.JsonReader;
import io.searchbox.client.JestClient;

public class IndexTypeUtil {

    public static void main(String[] args) {
        IndexTypeUtil.createIndexAndType("tanslator","es/mapping/tanslator.json");
       // IndexTypeUtil.createIndexAndType("task");
      //  IndexTypeUtil.createIndexAndType("ability");
       // IndexTypeUtil.createIndexAndType("paper");
    }

    public static void createIndexAndType(String index,String jsonPath){
        try{
            JestClient jestClient = JestService.getJestClient();
            JestService.createIndex(jestClient, index);
            JestService.createIndexMapping(jestClient,index,index,getSourceFromJson(jsonPath));
        }catch (Exception e){
            e.printStackTrace();
            //LOG.error("创建索引失败",e);
        }
    }
    public static String getSourceFromJson(String path){
        return JsonReader.readJson(path);
    }

    public static String getSource(String index){
        if(index.equals("task")){
            return "{\"_source\": {\n" +
                    "    \"enabled\": true\n" +
                    "  },\n" +
                    "  \"properties\": {\n" +
                    "    \"taskwordcount\": {\n" +
                    "      \"type\": \"long\"\n" +
                    "    },\n" +
                    "    \"taskprice\": {\n" +
                    "      \"type\": \"float\"\n" +
                    "    }\n" +
                    "  }\n" +
                    "}";
        }

        if(index.equals("tanslator")){
            return "{\n" +
                    "  \"_source\": {\n" +
                    "    \"enabled\": true\n" +
                    "  },\n" +
                    "  \"properties\": {\n" +
                    "    \"birthday\": {\n" +
                    "      \"type\": \"text\",\n" +
                    "      \"fields\": {\n" +
                    "        \"keyword\": {\n" +
                    "          \"ignore_above\": 256,\n" +
                    "          \"type\": \"keyword\"\n" +
                    "        }\n" +
                    "      }\n" +
                    "    },\n" +
                    "    \"createtime\":{\n" +
                    "      \"type\": \"long\"\n" +
                    "    },\n" +
                    "    \"updatetime\":{\n" +
                    "      \"type\": \"long\"\n" +
                    "    },\n" +
                    "    \"avgcooperation\":{\n" +
                    "      \"type\": \"long\"\n" +
                    "    },\n" +
                    "    \"cooperationwordcount\":{\n" +
                    "      \"type\": \"long\"\n" +
                    "    },\n" +
                    "    \"cooperation\":{\n" +
                    "      \"type\": \"long\"\n" +
                    "    },\n" +
                    "    \"cooperationtime\":{\n" +
                    "      \"type\": \"long\"\n" +
                    "    },\n" +
                    "    \"age\":{\n" +
                    "      \"type\": \"long\"\n" +
                    "    },\n" +
                    "    \"industry\": {\n" +
                    "      \"type\": \"nested\",\n" +
                    "      \"properties\": {\n" +
                    "        \"industryname\": {\n" +
                    "          \"type\": \"text\",\n" +
                    "          \"fields\": {\n" +
                    "            \"keyword\": {\n" +
                    "              \"ignore_above\": 256,\n" +
                    "              \"type\": \"keyword\"\n" +
                    "            }\n" +
                    "          }\n" +
                    "        },\n" +
                    "        \"count\": {\n" +
                    "          \"type\": \"long\"\n" +
                    "        },\n" +
                    "        \"industryid\": {\n" +
                    "          \"type\": \"text\",\n" +
                    "          \"fields\": {\n" +
                    "            \"keyword\": {\n" +
                    "              \"ignore_above\": 256,\n" +
                    "              \"type\": \"keyword\"\n" +
                    "            }\n" +
                    "          }\n" +
                    "        }\n" +
                    "      }\n" +
                    "    }\n" +
                    "\n" +
                    "  }\n" +
                    "}";
        }
        return "";
    }
}

JestService.java

package com.hsiehchou.es.jest.service;

import com.hsiehchou.common.file.FileCommon;
import com.google.gson.GsonBuilder;
import io.searchbox.action.Action;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.JestResult;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.*;
import io.searchbox.indices.CreateIndex;
import io.searchbox.indices.DeleteIndex;
import io.searchbox.indices.IndicesExists;
import io.searchbox.indices.mapping.GetMapping;
import io.searchbox.indices.mapping.PutMapping;
import org.apache.commons.lang.StringUtils;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.aggregations.AggregationBuilder;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.sort.SortOrder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.util.List;
import java.util.Map;

public class JestService {

    private static Logger LOG = LoggerFactory.getLogger(JestService.class);


    /**
     * 获取JestClient对象
     *
     * @return
     */
    public static JestClient getJestClient() {

        JestClientFactory factory = new JestClientFactory();
        factory.setHttpClientConfig(new HttpClientConfig
                .Builder("http://hadoop1:9200")
                //.defaultCredentials("sc_xy_mn_es","xy@66812.com")
                .gson(new GsonBuilder().setDateFormat("yyyy-MM-dd'T'hh:mm:ss").create())
                .connTimeout(1500)
                .readTimeout(3000)
                .multiThreaded(true)
                .build());
        return factory.getObject();
    }


    public static void main(String[] args) throws Exception {
        JestClient jestClient = null;
//        Map<String, Long> stringLongMap = null;
        List<Map<String, Object>> maps = null;
        try {
            jestClient = JestService.getJestClient();
           /* SearchResult aggregation = JestService.aggregation(jestClient,
                    "wechat",
                    "wechat",
                    "collect_time");
            stringLongMap = ResultParse.parseAggregation(aggregation);*/
           /* SearchResult search = search(jestClient,
                    "wechat",
                    "wechat",
                    "id",
                    "65a3d548bd3e42b1972191bc2bd2829b",
                    "collect_time",
                    "desc",
                    1,
                    2);*/
            /*SearchResult search = search(jestClient,
                    "",
                    "",
                    "phone_mac",
                    "aa-aa-aa-aa-aa-aa",
                    "collect_time",
                    "asc",
                    1,
                    1000);*/

//            System.out.println(indexExists(jestClient,"wechat"));
            System.out.println("wechat数据量："+count(jestClient,"wechat","wechat"));
            System.out.println(aggregation(jestClient,"wechat","wechat", "phone"));

            String[] includes = new String[]{"latitude","longitude","collect_time"};
//            try{
            SearchResult search = JestService.search(jestClient,
                        "",
                        "",
                        "phone_mac.keyword",
                        "aa-aa-aa-aa-aa-aa",
                        "collect_time",
                        "asc",
                        1,
                        2000);
                maps = ResultParse.parseSearchResultOnly(search);
                System.out.println(maps.size());
                System.out.println(maps);
            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                JestService.closeJestClient(jestClient);
            }
        System.out.println(maps);
//        } catch (Exception e) {
//            e.printStackTrace();
//        }finally {
//            JestService.closeJestClient(jestClient);
//        }
//        System.out.println(stringLongMap);
    }


    /**
     * 统计一个索引所有数据
     * @param jestClient
     * @param indexName
     * @param typeName
     * @return
     * @throws Exception
     */

    public static Long count(JestClient jestClient,
                             String indexName,
                             String typeName) throws Exception {
        Count count = new Count.Builder()
                .addIndex(indexName)
                .addType(typeName)
                .build();
        CountResult results = jestClient.execute(count);

        return results.getCount().longValue();
    }


    /**
     * 聚合分组查询
     * @param jestClient
     * @param indexName
     * @param typeName
     * @param field
     * @return
     * @throws Exception
     */
    public static SearchResult  aggregation(JestClient jestClient, String indexName, String typeName, String field) throws Exception {

        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        //分组聚合API
        AggregationBuilder group1 = AggregationBuilders.terms("group1").field(field);
        //group1.subAggregation(AggregationBuilders.terms("group2").field(query));
        searchSourceBuilder.aggregation(group1);
        searchSourceBuilder.size(0);
        System.out.println(searchSourceBuilder.toString());
        Search search = new Search.Builder(searchSourceBuilder.toString())
                .addIndex(indexName)
                .addType(typeName).build();
        SearchResult result = jestClient.execute(search);
        return result;
    }


    //基础封装
    public static SearchResult search(
            JestClient jestClient,
            String indexName,
            String typeName,
            String field,
            String fieldValue,
            String sortField,
            String sortValue,
            int pageNumber,
            int pageSize,
            String[] includes) {
        //构造一个查询体  封装的就是查询语句
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        searchSourceBuilder.fetchSource(includes,new String[0]);

        //查询构造器
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        if(StringUtils.isEmpty(field)){
            boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.matchAllQuery());
        }else{
            boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.termQuery(field,fieldValue));
        }

        searchSourceBuilder.query(boolQueryBuilder);
        //定义分页
        //从什么时候开始
        searchSourceBuilder.from((pageNumber-1)*pageSize);
        searchSourceBuilder.size(pageSize);

        //设置排序
        if("desc".equals(sortValue)){
            searchSourceBuilder.sort(sortField,SortOrder.DESC);
        }else{
            searchSourceBuilder.sort(sortField,SortOrder.ASC);
        }

        System.out.println("sql =====" + searchSourceBuilder.toString());

        //构造一个查询执行器
        Search.Builder builder = new Search.Builder(searchSourceBuilder.toString());
        //设置indexName typeName
        if(StringUtils.isNotBlank(indexName)){
            builder.addIndex(indexName);
        }
        if(StringUtils.isNotBlank(typeName)){
            builder.addType(typeName);
        }

        Search build = builder.build();
        SearchResult searchResult = null;
        try {
            searchResult = jestClient.execute(build);
        } catch (IOException e) {
            LOG.error("查询失败",e);
        }
        return searchResult;
    }

    //基础封装
    public static SearchResult search(
            JestClient jestClient,
            String indexName,
            String typeName,
            String field,
            String fieldValue,
            String sortField,
            String sortValue,
            int pageNumber,
            int pageSize) {

        //构造一个查询体  封装的就是查询语句
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        //查询构造器
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        if(StringUtils.isEmpty(field)){
            boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.matchAllQuery());
        }else{
            boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.termQuery(field,fieldValue));
        }

        searchSourceBuilder.query(boolQueryBuilder);
        //定义分页
        //从什么时候开始
        searchSourceBuilder.from((pageNumber-1)*pageSize);
        searchSourceBuilder.size(pageSize);

        //设置排序
        if("desc".equals(sortValue)){
            searchSourceBuilder.sort(sortField,SortOrder.DESC);
        }else{
            searchSourceBuilder.sort(sortField,SortOrder.ASC);
        }

        System.out.println("sql =====" + searchSourceBuilder.toString());

        //构造一个查询执行器
        Search.Builder builder = new Search.Builder(searchSourceBuilder.toString());
        //设置indexName typeName
        if(StringUtils.isNotBlank(indexName)){
            builder.addIndex(indexName);
        }
        if(StringUtils.isNotBlank(typeName)){
            builder.addType(typeName);
        }

        Search build = builder.build();
        SearchResult searchResult = null;
        try {
            searchResult = jestClient.execute(build);
        } catch (IOException e) {
            LOG.error("查询失败",e);
        }
        return searchResult;
    }


   /* //基础封装
    public static SearchResult search(
            JestClient jestClient,
            String indexName,
            String typeName,
            String field,
            String fieldValue,
            String sortField,
            String sortValue,
            int pageNumber,
            int pageSize) {

        //构造一个查询体  封装的就是查询语句
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        //查询构造器
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        if(StringUtils.isEmpty(field)){
            boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.matchAllQuery());
        }else{
            boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.termQuery(field,fieldValue));
        }
        searchSourceBuilder.query(boolQueryBuilder);
        //定义分页
        //从什么时候开始
        searchSourceBuilder.from((pageNumber-1)*pageSize);
        searchSourceBuilder.size(pageSize);
        //设置排序
        if("desc".equals(sortValue)){
            searchSourceBuilder.sort(sortField,SortOrder.DESC);
        }else{
            searchSourceBuilder.sort(sortField,SortOrder.ASC);
        }


        System.out.println("sql =====" + searchSourceBuilder.toString());

        //构造一个查询执行器
        Search.Builder builder = new Search.Builder(searchSourceBuilder.toString());

        //设置indexName typeName
        if(StringUtils.isNotBlank(indexName)){
            builder.addIndex(indexName);
        }
        if(StringUtils.isNotBlank(typeName)){
            builder.addType(typeName);
        }

        Search build = builder.build();
        SearchResult searchResult = null;
        try {
            searchResult = jestClient.execute(build);
        } catch (IOException e) {
            LOG.error("查询失败",e);
        }
        return searchResult;
    }
*/

    /**
     * 判断索引是否存在
     *
     * @param jestClient
     * @param indexName
     * @return
     * @throws Exception
     */
    public static boolean indexExists(JestClient jestClient, String indexName) {
        JestResult result = null;
        try {
            Action action = new IndicesExists.Builder(indexName).build();
            result = jestClient.execute(action);
        } catch (IOException e) {
            LOG.error(null, e);
        }
        return result.isSucceeded();
    }


    /**
     * 创建索引
     *
     * @param jestClient
     * @param indexName
     * @return
     * @throws Exception
     */
    public static boolean createIndex(JestClient jestClient, String indexName) throws Exception {

        if (!JestService.indexExists(jestClient, indexName)) {
            JestResult jr = jestClient.execute(new CreateIndex.Builder(indexName).build());
            return jr.isSucceeded();
        } else {
            LOG.info("该索引已经存在");
            return false;
        }
    }

    public static boolean createIndexWithSettingsMapAndMappingsString(JestClient jestClient, String indexName, String type, String path) throws Exception {

        // String mappingJson = "{\"type1\": {\"_source\":{\"enabled\":false},\"properties\":{\"field1\":{\"type\":\"keyword\"}}}}";
        String mappingJson = FileCommon.getAbstractPath(path);
        String realMappingJson = "{" + type + ":" + mappingJson + "}";
        System.out.println(realMappingJson);
        CreateIndex createIndex = new CreateIndex.Builder(indexName)
                .mappings(realMappingJson)
                .build();
        JestResult jr = jestClient.execute(createIndex);
        return jr.isSucceeded();
    }


    /**
     * Put映射
     *
     * @param jestClient
     * @param indexName
     * @param typeName
     * @param source
     * @return
     * @throws Exception
     */
    public static boolean createIndexMapping(JestClient jestClient, String indexName, String typeName, String source) throws Exception {

        PutMapping putMapping = new PutMapping.Builder(indexName, typeName, source).build();
        JestResult jr = jestClient.execute(putMapping);
        return jr.isSucceeded();
    }

    /**
     * Get映射
     *
     * @param jestClient
     * @param indexName
     * @param typeName
     * @return
     * @throws Exception
     */
    public static String getIndexMapping(JestClient jestClient, String indexName, String typeName) throws Exception {

        GetMapping getMapping = new GetMapping.Builder().addIndex(indexName).addType(typeName).build();
        JestResult jr = jestClient.execute(getMapping);
        return jr.getJsonString();
    }

    /**
     * 索引文档
     *
     * @param jestClient
     * @param indexName
     * @param typeName
     * @return
     * @throws Exception
     */
    public static boolean index(JestClient jestClient, String indexName, String typeName, String idField, List<Map<String, Object>> listMaps) throws Exception {

        Bulk.Builder bulk = new Bulk.Builder().defaultIndex(indexName).defaultType(typeName);
        for (Map<String, Object> map : listMaps) {
            if (map != null && map.containsKey(idField)) {
                Object o = map.get(idField);
                Index index = new Index.Builder(map).id(map.get(idField).toString()).build();
                bulk.addAction(index);
            }
        }
        BulkResult br = jestClient.execute(bulk.build());
        return br.isSucceeded();
    }


    /**
     * 索引文档
     *
     * @param jestClient
     * @param indexName
     * @param typeName
     * @return
     * @throws Exception
     */
    public static boolean indexString(JestClient jestClient, String indexName, String typeName, String idField, List<Map<String, String>> listMaps) throws Exception {
        if (listMaps != null && listMaps.size() > 0) {
            Bulk.Builder bulk = new Bulk.Builder().defaultIndex(indexName).defaultType(typeName);
            for (Map<String, String> map : listMaps) {
                if (map != null && map.containsKey(idField)) {
                    Index index = new Index.Builder(map).id(map.get(idField)).build();
                    bulk.addAction(index);
                }
            }
            BulkResult br = jestClient.execute(bulk.build());
            return br.isSucceeded();
        } else {
            return false;
        }
    }

    /**
     * 索引文档
     *
     * @param jestClient
     * @param indexName
     * @param typeName
     * @return
     * @throws Exception
     */
    public static boolean indexOne(JestClient jestClient, String indexName, String typeName, String id, Map<String, Object> map) {
        Index.Builder builder = new Index.Builder(map);
        builder.id(id);
        builder.refresh(true);
        Index index = builder.index(indexName).type(typeName).build();
        try {
            JestResult result = jestClient.execute(index);
            if (result != null && !result.isSucceeded()) {
                throw new RuntimeException(result.getErrorMessage() + "插入更新索引失败!");
            }
        } catch (Exception e) {
            e.printStackTrace();
            return false;
        }
        return true;
    }


    /**
     * 搜索文档
     *
     * @param jestClient
     * @param indexName
     * @param typeName
     * @param query
     * @return
     * @throws Exception
     */
    public static SearchResult search(JestClient jestClient, String indexName, String typeName, String query) throws Exception {

        Search search = new Search.Builder(query)
                .addIndex(indexName)
                .addType(typeName)
                .build();
        return jestClient.execute(search);
    }

    /**
     * Get文档
     *
     * @param jestClient
     * @param indexName
     * @param typeName
     * @param id
     * @return
     * @throws Exception
     */
    public static JestResult get(JestClient jestClient, String indexName, String typeName, String id) throws Exception {

        Get get = new Get.Builder(indexName, id).type(typeName).build();
        return jestClient.execute(get);
    }

    /**
     * Delete索引
     *
     * @param jestClient
     * @param indexName
     * @return
     * @throws Exception
     */
    public boolean delete(JestClient jestClient, String indexName) throws Exception {

        JestResult jr = jestClient.execute(new DeleteIndex.Builder(indexName).build());
        return jr.isSucceeded();
    }

    /**
     * Delete文档
     *
     * @param jestClient
     * @param indexName
     * @param typeName
     * @param id
     * @return
     * @throws Exception
     */
    public static boolean delete(JestClient jestClient, String indexName, String typeName, String id) throws Exception {

        DocumentResult dr = jestClient.execute(new Delete.Builder(id).index(indexName).type(typeName).build());
        return dr.isSucceeded();
    }

    /**
     * 关闭JestClient客户端
     *
     * @param jestClient
     * @throws Exception
     */
    public static void closeJestClient(JestClient jestClient) {

        if (jestClient != null) {
            jestClient.shutdownClient();
        }
    }


    public static String query = "{\n" +
            "  \"size\": 1,\n" +
            "  \"query\": {\n" +
            "     \"match\": {\n" +
            "       \"taskexcuteid\": \"89899143\"\n" +
            "     }\n" +
            "  },\n" +
            "  \"aggs\": {\n" +
            "    \"count\": {\n" +
            "      \"terms\": {\n" +
            "        \"field\": \"source.keyword\"\n" +
            "      },\n" +
            "      \"aggs\": {\n" +
            "        \"sum_price\": {\n" +
            "          \"sum\": {\n" +
            "            \"field\": \"taskprice\"\n" +
            "          }\n" +
            "        },\n" +
            "        \"sum_wordcount\": {\n" +
            "          \"sum\": {\n" +
            "            \"field\": \"taskwordcount\"\n" +
            "          }\n" +
            "        },\n" +
            "        \"avg_taskprice\": {\n" +
            "          \"avg\": {\n" +
            "            \"field\": \"taskprice\"\n" +
            "          }\n" +
            "        }\n" +
            "      }\n" +
            "    }\n" +
            "  }\n" +
            "}";
}

ResultParse.java

package com.hsiehchou.es.jest.service;

import com.google.gson.Gson;
import com.google.gson.JsonElement;
import com.google.gson.JsonObject;
import com.google.gson.JsonPrimitive;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestResult;
import io.searchbox.core.SearchResult;
import io.searchbox.core.search.aggregation.MetricAggregation;
import io.searchbox.core.search.aggregation.TermsAggregation;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.*;

public class ResultParse {

    private static Logger LOG = LoggerFactory.getLogger(ResultParse.class);

    public static void main(String[] args) throws Exception {
        JestClient jestClient = JestService.getJestClient();

        /*long l = System.currentTimeMillis();
        JestClient jestClient = JestClientUtil.getJestClient();
        System.out.println(jestClient);
        String json ="{\n" +
                "  \"size\": 1, \n" +
                "  \"query\": {\n" +
                "    \"query_string\": {\n" +
                "      \"query\": \"中文\"\n" +
                "    }\n" +
                "  },\n" +
                "  \"highlight\": {\n" +
                "    \"pre_tags\" : [ \"<red>\" ],\n" +
                "    \"post_tags\" : [ \"</red>\" ],\n" +
                "    \"fields\":{\n" +
                "      \"secondlanguage\": {}\n" +
                "      ,\"firstlanguage\": {}\n" +
                "    }\n" +
                "  }\n" +
                "}";
        SearchResult search = JestService.search(jestClient, ES_INDEX.TANSLATOR_TEST, ES_INDEX.TANSLATOR_TEST,json);
        ResultParse.parseSearchResult(search);
        jestClient.shutdownClient();
        long l1 = System.currentTimeMillis();
        System.out.println(l1-l);*/
    }

    public static Map<String,Object> parseGet(JestResult getResult){
        Map<String,Object> map = null;
        JsonObject jsonObject = getResult.getJsonObject().getAsJsonObject("_source");
        if(jsonObject != null){
            map = new HashMap<String,Object>();
            //System.out.println(jsonObject);
            Set<Map.Entry<String, JsonElement>> entries = jsonObject.entrySet();
            for(Map.Entry<String, JsonElement> entry:entries){
                JsonElement value = entry.getValue();
                if(value.isJsonPrimitive()){
                    JsonPrimitive value1 = (JsonPrimitive) value;
                  //  LOG.error("转换前==========" + value1);
                    if( value1.isString() ){
                       // LOG.error("转换后==========" + value1.getAsString());
                        map.put(entry.getKey(),value1.getAsString());
                    }else{
                        map.put(entry.getKey(),value1);
                    }
                }else{
                    map.put(entry.getKey(),value);
                }
             }
        }
        return map;
    }

    public static Map<String,Object> parseGet2map(JestResult getResult){

        JsonObject source = getResult.getJsonObject().getAsJsonObject("_source");
        Gson gson = new Gson();
        Map map = gson.fromJson(source, Map.class);
        return map;
    }

    /**
     * 解析listMap
     * 结果格式为  {hits=0, total=0, data=[]}
     * @param search
     * @return
     */
    public static List<Map<String,Object>> parseSearchResultOnly(SearchResult search){

        List<Map<String,Object>> list = new ArrayList<Map<String,Object>>();
        List<SearchResult.Hit<Object, Void>> hits = search.getHits(Object.class);
        for(SearchResult.Hit<Object, Void> hit : hits){
            Map<String,Object> source = (Map<String,Object>)hit.source;
            list.add(source);
        }
        return list;
    }

    /**
     * 解析listMap
     * 结果格式为  {hits=0, total=0, data=[]}
     * @param search
     * @return
     */
    public static Map<String,Long> parseAggregation(SearchResult search){
        Map<String,Long> mapResult = new HashMap<>();
        MetricAggregation aggregations = search.getAggregations();
        TermsAggregation group1 = aggregations.getTermsAggregation("group1");
        List<TermsAggregation.Entry> buckets = group1.getBuckets();
        buckets.forEach(x->{
            String key = x.getKey();
            Long count = x.getCount();
            mapResult.put(key,count);
        });
        return mapResult;
    }

    /**
     * 解析listMap
     * 结果格式为  {hits=0, total=0, data=[]}
     * @param search
     * @return
     */
    public static Map<String,Object> parseSearchResult(SearchResult search){

        Map<String,Object> map = new HashMap<String,Object>();
        List<Map<String,Object>> list = new ArrayList<Map<String,Object>>();

        Long total = search.getTotal();
        map.put("total",total);
        List<SearchResult.Hit<Object, Void>> hits = search.getHits(Object.class);
        map.put("hits",hits.size());
        for(SearchResult.Hit<Object, Void> hit : hits){
            Map<String, List<String>> highlight = hit.highlight;
            Map<String,Object> source = (Map<String,Object>)hit.source;
            source.put("highlight",highlight);
            list.add(source);
        }
        map.put("data",list);
        return map;
    }
}

5、search

BuilderUtil.java

package com.hsiehchou.es.search;

import org.apache.commons.lang.StringUtils;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.client.transport.TransportClient;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class BuilderUtil {

    private static Logger LOG = LoggerFactory.getLogger(BuilderUtil.class);

    public static SearchRequestBuilder getSearchBuilder(TransportClient client, String index, String type){
        SearchRequestBuilder builder = null;
        try {
            if (StringUtils.isNotBlank(index)) {
                builder = client.prepareSearch(index.split(","));
            } else {
                builder = client.prepareSearch();
            }
            if (StringUtils.isNotBlank(type)) {
                builder.setTypes(type.split(","));
            }
        } catch (Exception e) {
            LOG.error(null, e);
        }
        return builder;
    }

    public static SearchRequestBuilder getSearchBuilder(TransportClient client, String[] indexs, String type){
        SearchRequestBuilder builder = null;
        try {
            if (indexs.length>0) {
                for(String index:indexs){
                    builder = client.prepareSearch(index);
                }
            } else {
                builder = client.prepareSearch();
            }
            if (StringUtils.isNotBlank(type)) {
                builder.setTypes(type);
            }
        } catch (Exception e) {
            LOG.error(null, e);
        }
        return builder;
    }
}

QueryUtil.java

package com.hsiehchou.es.search;

import com.hsiehchou.es.utils.UnicodeUtil;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.QueryStringQueryBuilder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.Map;

public class QueryUtil {

    private static Logger LOG = LoggerFactory.getLogger(QueryUtil.class);

    /**
     * EQ   等於
     * NEQ  不等於
     * GE   大于等于
     * GT   大于
     * LE   小于等于
     * LT   小于
     * RANGE 区间范围
     */
    public static enum OPREATOR {EQ, NEQ,WILDCARD, GE, LE, GT, LT, FUZZY, RANGE, IN, PREFIX}

    /**
     * @param paramMap
     * @return
     */
    public static BoolQueryBuilder getSearchParam(Map<OPREATOR, Map<String, Object>> paramMap) {

        BoolQueryBuilder qb = QueryBuilders.boolQuery();

        if (null != paramMap && !paramMap.isEmpty()) {

            for (Map.Entry<OPREATOR, Map<String, Object>> paramEntry : paramMap.entrySet()) {

                OPREATOR key = paramEntry.getKey();
                Map<String, Object> fieldMap = paramEntry.getValue();

                for (Map.Entry<String, Object> fieldEntry : fieldMap.entrySet()) {

                    String field = fieldEntry.getKey();
                    Object value = fieldEntry.getValue();

                    switch (key) {
                        case EQ:/**等於查詢 equale**/
                            qb.must(QueryBuilders.matchPhraseQuery(field, value).slop(0));
                            break;
                        case NEQ:/**不等於查詢 not equale**/
                            qb.mustNot(QueryBuilders.matchQuery(field, value));
                            break;
                        case GE: /**大于等于查詢  great than or equal to**/
                            qb.must(QueryBuilders.rangeQuery(field).gte(value));
                            break;
                        case LE: /**小于等于查詢 less than or equal to**/
                            qb.must(QueryBuilders.rangeQuery(field).lte(value));
                            break;
                        case GT: /**大于查詢**/
                            qb.must(QueryBuilders.rangeQuery(field).gt(value));
                            break;
                        case LT: /**小于查詢**/
                            qb.must(QueryBuilders.rangeQuery(field).lt(value));
                            break;
                        case FUZZY:
                            String text = String.valueOf(value);
                            if (!UnicodeUtil.hasChinese(text)) {
                                text = "*" + text + "*";
                            }
                            text = QueryParser.escape(text);
                            qb.must(new QueryStringQueryBuilder(text).field(field));
                            break;

                        case RANGE: /**区间查詢**/
                            String[] split = value.toString().split(",");
                            if(split.length==2){
                                qb.must(QueryBuilders.rangeQuery(field).from(Long.valueOf(split[0]))
                                        .to(Long.valueOf(split[1])));
                            }
                             /*  if (value instanceof Map) {
                                Map<String, Object> rangMap = (Map<String, Object>) value;
                                qb.must(QueryBuilders.rangeQuery(field).from(rangMap.get("ge"))
                                        .to(rangMap.get("le")));
                            }*/
                            break;

                        case PREFIX: /**前缀查詢**/
                            qb.must(QueryBuilders.prefixQuery(field, String.valueOf(value)));
                            break;

                        case IN:
                            qb.must(QueryBuilders.termsQuery(field, (Object[]) value));
                            break;

                        default:
                            qb.must(QueryBuilders.matchQuery(field, value));
                            break;
                    }
                }
            }
        }
        return qb;
    }
}

ResponseParse.java

package com.hsiehchou.es.search;

import org.elasticsearch.action.get.GetResponse;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.Map;

public class ResponseParse {

    private static Logger LOG = LoggerFactory.getLogger(BuilderUtil.class);

    public static Map<String, Object> parseGetResponse(GetResponse getResponse){
        Map<String, Object> source = null;
        try {
            source = getResponse.getSource();
        } catch (Exception e) {
            LOG.error(null,e);
        }
        return source;
    }
}

SearchUtil.java

package com.hsiehchou.es.search;

import com.hsiehchou.es.client.ESClientUtils;
import org.elasticsearch.action.get.GetRequestBuilder;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.MatchQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;

public class SearchUtil {

    private static Logger LOG = LoggerFactory.getLogger(SearchUtil.class);

    private static TransportClient client = ESClientUtils.getClient();

    public static void main(String[] args) {
        TransportClient client = ESClientUtils.getClient();
        List<Map<String, Object>> maps = searchSingleData(client, "wechat", "wechat", "phone_mac", "aa-aa-aa-aa-aa-aa");
        System.out.println(maps);
        /* long l = System.currentTimeMillis();
        searchSingleData("tanslator", "tanslator","4e1117d7-c434-48a7-9134-45f7c90f94ee_TR1100397895_2");
        System.out.println("消耗时间" + (System.currentTimeMillis() - l));

        long lll = System.currentTimeMillis();
        searchSingleData("tanslator", "tanslator","4e1117d7-c434-48a7-9134-45f7c90f94ee_TR1100397895_2");
        System.out.println("消耗时间" + (System.currentTimeMillis() - lll));

        long ll = System.currentTimeMillis();
        List<Map<String, Object>> maps = searchSingleData(client,"tanslator", "tanslator", "iolid", "TR1100397895");
        System.out.println("消耗时间" + (System.currentTimeMillis() - ll));
        System.out.println(maps);*/
    }

    /**
     * 查询单条数据
     * @param index  索引
     * @param type   表名
     * @param id     字段
     * @return
     */
    public static GetResponse searchSingleData(String index, String type, String id) {
        GetResponse response = null;
        try {
            GetRequestBuilder builder = null;
            builder = client.prepareGet(index, type, id);
            response = builder.execute().actionGet();
        } catch (Exception e) {
            LOG.error(null, e);
        }
        return response;
    }

    /**
     * @param index
     * @param type
     * @param field
     * @param value
     * @return
     */
    public static List<Map<String, Object>> searchSingleData(TransportClient client,String index, String type,String field, String value) {
        List<Map<String, Object>> result = new ArrayList<>();
        try {
            SearchRequestBuilder builder = BuilderUtil.getSearchBuilder(client,index,type);
            MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery(field, value);
            builder.setQuery(matchQueryBuilder).setExplain(false);
            SearchResponse searchResponse = builder.execute().actionGet();
            SearchHits hits = searchResponse.getHits();
            SearchHit[] searchHists = hits.getHits();
            for (SearchHit sh : searchHists) {
                result.add(sh.getSourceAsMap());
            }
        } catch (Exception e) {
            e.printStackTrace();
            LOG.error(null, e);
        }
        return result;
    }

    /**
     * 多条件查詢
     * @param index
     * @param type
     * @param paramMap 组合查询条件
     * @return
     */
    public static SearchResponse searchListData(String index, String type,
                                                Map<QueryUtil.OPREATOR,Map<String,Object>> paramMap) {

        SearchRequestBuilder builder = BuilderUtil.getSearchBuilder(client,index,type);
        builder.setQuery(QueryUtil.getSearchParam(paramMap)).setExplain(false);
        SearchResponse searchResponse = builder.get();

        return searchResponse;
    }

    /**
     * 多条件查詢
     * @param index
     * @param type
     * @param paramMap 组合查询条件
     * @return
     */
    public static SearchResponse searchListData1(String index, String type, Map<String,String> paramMap) {

        BoolQueryBuilder qb = QueryBuilders.boolQuery();
        qb.must(QueryBuilders.matchQuery("", ""));

        BoolQueryBuilder qb1 = QueryBuilders.boolQuery();
        qb1.should(QueryBuilders.matchQuery("",""));
        qb1.should(QueryBuilders.matchQuery("",""));

        qb.must(qb1);
        return null;
    }
}

6、utils

ESresultUtil.java

package com.hsiehchou.es.utils;

import org.apache.commons.lang.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.Map;

public class ESresultUtil {

    private static Logger LOG = LoggerFactory.getLogger(ESresultUtil.class);

    public static Long getLong(Map<String,Object> esMAp,String field){

        Long valueLong = 0L;
        if(esMAp!=null && esMAp.size()>0){
            if(esMAp.containsKey(field)){
                 Object value = esMAp.get(field);
                 if(value!=null && StringUtils.isNotBlank(value.toString())){
                     valueLong = Long.valueOf(value.toString());
                 }
            }
        }
        return valueLong;
    }
}

UnicodeUtil.java

package com.hsiehchou.es.utils;

import java.util.regex.Pattern;

public class UnicodeUtil {

    // 根据Unicode编码完美的判断中文汉字和符号
    private static boolean isChinese(char c) {
        Character.UnicodeBlock ub = Character.UnicodeBlock.of(c);
        if (ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS || ub == Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS
                || ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A || ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B
                || ub == Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION || ub == Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS
                || ub == Character.UnicodeBlock.GENERAL_PUNCTUATION) {
            return true;
        }
        return false;
    }

    // 完整的判断中文汉字和符号
    public static boolean isChinese(String strName) {
        char[] ch = strName.toCharArray();
        for (int i = 0; i < ch.length; i++) {
            char c = ch[i];
            if (isChinese(c)) {
                return true;
            }
        }
        return false;
    }

    // 完整的判断中文汉字和符号
    public static boolean hasChinese(String strName) {
        char[] ch = strName.toCharArray();
        for (int i = 0; i < ch.length; i++) {
            char c = ch[i];
            if (isChinese(c)) {
                return true;
            }
        }
        return false;
    }

    // 只能判断部分CJK字符（CJK统一汉字）
    public static boolean isChineseByREG(String str) {
        if (str == null) {
            return false;
        }
        Pattern pattern = Pattern.compile("[\\u4E00-\\u9FBF]+");
        return pattern.matcher(str.trim()).find();
    }

    // 只能判断部分CJK字符（CJK统一汉字）
    /*    public static boolean isChineseByName(String str) {
        if (str == null) {
            return false;
        }
        // 大小写不同：\\p 表示包含，\\P 表示不包含
        // \\p{Cn} 的意思为 Unicode 中未被定义字符的编码，\\P{Cn} 就表示 Unicode中已经被定义字符的编码
        String reg = "\\p{InCJK Unified Ideographs}&&\\P{Cn}";
        Pattern pattern = Pattern.compile(reg);
        return pattern.matcher(str.trim()).find();
    }*/

    public static void main(String[] args) {
        System.out.println(hasChinese("aa表aa"));
    }
}

7、V2

ElasticSearchService.java

package com.hsiehchou.es.V2;

import com.hsiehchou.es.client.ESClientUtils;
import org.apache.commons.collections.map.HashedMap;
import org.apache.commons.lang.StringUtils;
import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexResponse;
import org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsRequest;
import org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsResponse;
import org.elasticsearch.action.bulk.BulkRequestBuilder;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.aggregations.AggregationBuilder;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.terms.Terms;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.elasticsearch.search.sort.SortBuilder;
import org.elasticsearch.search.sort.SortOrder;

import java.util.*;

/**
 *  ES检索封装
 */
public class ElasticSearchService {

    private final static int MAX = 10000;

    private static TransportClient client = ESClientUtils.getClient();

    /**
     * 功能描述：新建索引
     * @param indexName 索引名
     */
    public void createIndex(String indexName) {
        client.admin().indices().create(new CreateIndexRequest(indexName))
                .actionGet();
    }

    /**
     * 功能描述：新建索引
     * @param index 索引名
     * @param type 类型
     */
    public void createIndex(String index, String type) {
        client.prepareIndex(index, type).setSource().get();
    }

    /**
     * 功能描述：删除索引
     * @param index 索引名
     */
    public void deleteIndex(String index) {
        if (indexExist(index)) {
            DeleteIndexResponse dResponse = client.admin().indices().prepareDelete(index)
                    .execute().actionGet();
            if (!dResponse.isAcknowledged()) {

            }
        } else {

        }
    }

    /**
     * 功能描述：验证索引是否存在
     * @param index 索引名
     */
    public boolean indexExist(String index) {
        IndicesExistsRequest inExistsRequest = new IndicesExistsRequest(index);
        IndicesExistsResponse inExistsResponse = client.admin().indices()
                .exists(inExistsRequest).actionGet();
        return inExistsResponse.isExists();
    }

    /**
     * 功能描述：插入数据
     * @param index 索引名
     * @param type 类型
     * @param json 数据
     */
    public void insertData(String index, String type, String json) {
       client.prepareIndex(index, type)
                .setSource(json)
                .get();
    }

    /**
     * 功能描述：插入数据
     * @param index 索引名
     * @param type 类型
     * @param _id 数据id
     * @param json 数据
     */
    public void insertData(String index, String type, String _id, String json) {
        client.prepareIndex(index, type).setId(_id)
                .setSource(json)
                .get();
    }

    /**
     * 功能描述：更新数据
     * @param index 索引名
     * @param type 类型
     * @param _id 数据id
     * @param json 数据
     */
    public void updateData(String index, String type, String _id, String json) throws Exception {
        try {
            UpdateRequest updateRequest = new UpdateRequest(index, type, _id)
                    .doc(json);
            client.update(updateRequest).get();
        } catch (Exception e) {
            //throw new MessageException("update data failed.", e);
        }
    }

    /**
     * 功能描述：删除数据
     * @param index 索引名
     * @param type 类型
     * @param _id 数据id
     */
    public void deleteData(String index, String type, String _id) {
        client.prepareDelete(index, type, _id)
                .get();
    }

    /**
     * 功能描述：批量插入数据
     * @param index 索引名
     * @param type 类型
     * @param data (_id 主键, json 数据)
     */
    public void bulkInsertData(String index, String type, Map<String, String> data) {
        BulkRequestBuilder bulkRequest = client.prepareBulk();
        data.forEach((param1, param2) -> {
            bulkRequest.add(client.prepareIndex(index, type, param1)
                    .setSource(param2)
            );
        });
        bulkRequest.get();
    }

    /**
     * 功能描述：批量插入数据
     * @param index 索引名
     * @param type 类型
     * @param jsonList 批量数据
     */
    public void bulkInsertData(String index, String type, List<String> jsonList) {
        BulkRequestBuilder bulkRequest = client.prepareBulk();
        jsonList.forEach(item -> {
            bulkRequest.add(client.prepareIndex(index, type)
                    .setSource(item)
            );
        });
        bulkRequest.get();
    }

    /**
     * 功能描述：查询
     * @param index 索引名
     * @param type 类型
     * @param constructor 查询构造
     */
    public List<Map<String, Object>> search(String index, String type, ESQueryBuilderConstructor constructor) {

        List<Map<String, Object>> list = new ArrayList<>();
        SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index).setTypes(type);
        //排序
        if (StringUtils.isNotEmpty(constructor.getAsc()))
            searchRequestBuilder.addSort(constructor.getAsc(), SortOrder.ASC);
        if (StringUtils.isNotEmpty(constructor.getDesc()))
            searchRequestBuilder.addSort(constructor.getDesc(), SortOrder.DESC);
        //设置查询体
        searchRequestBuilder.setQuery(constructor.listBuilders());
        //返回条目数
        int size = constructor.getSize();
        if (size < 0) {
            size = 0;
        }
        if (size > MAX) {
            size = MAX;
        }
        //返回条目数
        searchRequestBuilder.setSize(size);
        searchRequestBuilder.setFrom(constructor.getFrom() < 0 ? 0 : constructor.getFrom());
        SearchResponse searchResponse = searchRequestBuilder.execute().actionGet();
        SearchHits hits = searchResponse.getHits();
        SearchHit[] searchHists = hits.getHits();
        for (SearchHit sh : searchHists) {
            list.add(sh.getSourceAsMap());
        }
        return list;
    }


    /**
     * 功能描述：查询
     * @param index 索引名
     * @param type 类型
     * @param constructor 查询构造
     */
    public Map<String,Object> searchCountAndMessage(String index, String type, ESQueryBuilderConstructor constructor) {
        Map<String,Object> map = new HashMap<String,Object>();
        List<Map<String, Object>> list = new ArrayList<>();
        SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index).setTypes(type);
        //排序
        if (StringUtils.isNotEmpty(constructor.getAsc()))
            searchRequestBuilder.addSort(constructor.getAsc(), SortOrder.ASC);
        if (StringUtils.isNotEmpty(constructor.getDesc()))
            searchRequestBuilder.addSort(constructor.getDesc(), SortOrder.DESC);
        //设置查询体
        searchRequestBuilder.setQuery(constructor.listBuilders());
        //返回条目数
        int size = constructor.getSize();
        if (size < 0) {
            size = 0;
        }
        if (size > MAX) {
            size = MAX;
        }

        //返回条目数
        searchRequestBuilder.setSize(size);
        searchRequestBuilder.setFrom(constructor.getFrom() < 0 ? 0 : constructor.getFrom());
        SearchResponse searchResponse = searchRequestBuilder.execute().actionGet();
        long totalHits = searchResponse.getHits().getTotalHits();

        SearchHits hits = searchResponse.getHits();
        SearchHit[] searchHists = hits.getHits();
        for (SearchHit sh : searchHists) {
            list.add(sh.getSourceAsMap());
        }
        map.put("total",(long)searchHists.length);
        map.put("count",totalHits);
        map.put("data",list);
        return map;
    }

    /**
     * 功能描述：查询
     * @param index 索引名
     * @param type 类型
     * @param constructor 查询构造
     */
    public Map<String,Object> searchCountAndMessageNew(String index, String type, ESQueryBuilderConstructorNew constructor) {
        Map<String,Object> map = new HashMap<String,Object>();
        List<Map<String, Object>> list = new ArrayList<>();
        SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index).setTypes(type);

        //排序
        List<SortBuilder> sortBuilderList = constructor.getSortBuilderList();
        if(sortBuilderList!=null && sortBuilderList.size()>0){
            sortBuilderList.forEach(sortBuilder->{
                searchRequestBuilder.addSort(sortBuilder);
            });
        }

        //设置查询体
        searchRequestBuilder.setQuery(constructor.listBuilders());

        //返回条目数
        int size = constructor.getSize();
        if (size < 0) {
            size = 0;
        }
        if (size > MAX) {
            size = MAX;
        }
        //返回条目数
        searchRequestBuilder.setSize(size);
        searchRequestBuilder.setFrom(constructor.getFrom() < 0 ? 0 : constructor.getFrom());

        //设置高亮
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        List<String> highLighterFields = constructor.getHighLighterFields();
        if(highLighterFields.size()>0){
            highLighterFields.forEach(field -> {
                highlightBuilder.field(field);
            });

        }

        highlightBuilder.preTags("<font color=\"red\">");
        highlightBuilder.postTags("</font>");
        SearchResponse searchResponse = searchRequestBuilder.highlighter(highlightBuilder).execute().actionGet();
        long totalHits = searchResponse.getHits().getTotalHits();

        SearchHits hits = searchResponse.getHits();
        SearchHit[] searchHists = hits.getHits();
        for (SearchHit hit : searchHists) {

            Map<String, Object> sourceAsMap = hit.getSourceAsMap();
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();

            //获取高亮结果
            Set<String> set = highlightFields.keySet();

            for (String str : set) {
                Text[] fragments = highlightFields.get(str).getFragments();
                String st1r="";
                for(Text text:fragments){
                    st1r = st1r + text.toString();
                }
                sourceAsMap.put(str,st1r);
                System.out.println("str(==============" + st1r);
            }

            list.add(sourceAsMap);
        }
        map.put("total",(long)searchHists.length);
        map.put("count",totalHits);
        map.put("data",list);
        return map;
    }

    /**
     * 功能描述：统计查询
     * @param index 索引名
     * @param type 类型
     * @param constructor 查询构造
     * @param groupBy 统计字段
     */
    public Map<Object, Object> statSearch(String index, String type, ESQueryBuilderConstructor constructor, String groupBy) {
        Map<Object, Object> map = new HashedMap();
        SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index).setTypes(type);
        //排序
        if (StringUtils.isNotEmpty(constructor.getAsc()))
            searchRequestBuilder.addSort(constructor.getAsc(), SortOrder.ASC);
        if (StringUtils.isNotEmpty(constructor.getDesc()))
            searchRequestBuilder.addSort(constructor.getDesc(), SortOrder.DESC);
        //设置查询体
        if (null != constructor) {
            searchRequestBuilder.setQuery(constructor.listBuilders());
        } else {
            searchRequestBuilder.setQuery(QueryBuilders.matchAllQuery());
        }
        int size = constructor.getSize();
        if (size < 0) {
            size = 0;
        }
        if (size > MAX) {
            size = MAX;
        }
        //返回条目数
        searchRequestBuilder.setSize(size);

        searchRequestBuilder.setFrom(constructor.getFrom() < 0 ? 0 : constructor.getFrom());
        SearchResponse sr = searchRequestBuilder.addAggregation(
                AggregationBuilders.terms("agg").field(groupBy)
        ).get();

        Terms stateAgg = sr.getAggregations().get("agg");

        Iterator<? extends Terms.Bucket> iter = stateAgg.getBuckets().iterator();

        while (iter.hasNext()) {
            Terms.Bucket gradeBucket = iter.next();
            map.put(gradeBucket.getKey(), gradeBucket.getDocCount());
        }

        return map;
    }

    /**
     * 功能描述：统计查询
     * @param index 索引名
     * @param type 类型
     * @param constructor 查询构造
     * @param agg 自定义计算
     */
    public Map<Object, Object> statSearch(String index, String type, ESQueryBuilderConstructor constructor, AggregationBuilder agg) {
        if (agg == null) {
            return null;
        }
        Map<Object, Object> map = new HashedMap();
        SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index).setTypes(type);
        //排序
        if (StringUtils.isNotEmpty(constructor.getAsc()))
            searchRequestBuilder.addSort(constructor.getAsc(), SortOrder.ASC);
        if (StringUtils.isNotEmpty(constructor.getDesc()))
            searchRequestBuilder.addSort(constructor.getDesc(), SortOrder.DESC);
        //设置查询体
        if (null != constructor) {
            searchRequestBuilder.setQuery(constructor.listBuilders());
        } else {
            searchRequestBuilder.setQuery(QueryBuilders.matchAllQuery());
        }
        int size = constructor.getSize();
        if (size < 0) {
            size = 0;
        }
        if (size > MAX) {
            size = MAX;
        }
        //返回条目数
        searchRequestBuilder.setSize(size);

        searchRequestBuilder.setFrom(constructor.getFrom() < 0 ? 0 : constructor.getFrom());
        SearchResponse sr = searchRequestBuilder.addAggregation(
                agg
        ).get();

        Terms stateAgg = sr.getAggregations().get("agg");
        Iterator<? extends Terms.Bucket> iter = stateAgg.getBuckets().iterator();

        while (iter.hasNext()) {
            Terms.Bucket gradeBucket = iter.next();
            map.put(gradeBucket.getKey(), gradeBucket.getDocCount());
        }
        return map;
    }

    /**
     * 功能描述：关闭链接
     */
    public void close() {
        client.close();
    }

    public static void test() {
        try{
            ElasticSearchService service = new ElasticSearchService();
            ESQueryBuilderConstructorNew constructor = new ESQueryBuilderConstructorNew();
            constructor.must(new ESQueryBuilders().bool(QueryBuilders.boolQuery()));
            constructor.must(new ESQueryBuilders().match("secondlanguage", "4"));
            constructor.must(new ESQueryBuilders().match("secondlanguage", "4"));
            constructor.should(new ESQueryBuilders().match("source", "5"));
            constructor.should(new ESQueryBuilders().match("source", "5"));
            service.searchCountAndMessageNew("", "", constructor);
        }catch (Exception e){
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        try {
            ElasticSearchService service = new ElasticSearchService();
            ESQueryBuilderConstructor constructor = new ESQueryBuilderConstructor();

         /*   constructor.must(new ESQueryBuilders().term("gender", "f").range("age", 20, 50));

            constructor.should(new ESQueryBuilders().term("gender", "f").range("age", 20, 50).fuzzy("age", 20));
            constructor.mustNot(new ESQueryBuilders().term("gender", "m"));
            constructor.setSize(15);  //查询返回条数，最大 10000
            constructor.setFrom(11);  //分页查询条目起始位置， 默认0
            constructor.setAsc("age"); //排序

            List<Map<String, Object>> list = service.search("bank", "account", constructor);
            Map<Object, Object> map = service.statSearch("bank", "account", constructor, "state");*/

            constructor.must(new ESQueryBuilders().match("id", "WE16000190TR"));
            List<Map<String, Object>> list = service.search("test01", "test01", constructor);
             for(Map<String, Object> map : list){
                 System.out.println(map);
             }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

ESCriterion.java

package com.hsiehchou.es.V2;

import org.elasticsearch.index.query.QueryBuilder;

import java.util.List;

/**
 * 条件接口
 */
public interface ESCriterion {

    public enum Operator {
        PREFIX,             /**根据字段前缀查询**/
        MATCH,              /**匹配查询**/
        MATCH_PHRASE,       /**精确匹配**/
        MULTI_MATCH,        /**多字段匹配**/

        TERM,               /**term查询**/
        TERMS,              /**term查询**/

        RANGE,              /**范围查询**/
        GTE,                 /**大于等于查询**/
        LTE,

        FUZZY,              /**根据字段前缀查询**/
        QUERY_STRING,       /**根据字段前缀查询**/
        MISSING ,           /**根据字段前缀查询**/

        BOOL
    }

    public enum MatchMode {
        START, END, ANYWHERE
    }

    public enum Projection {
        MAX, MIN, AVG, LENGTH, SUM, COUNT
    }

    public List<QueryBuilder> listBuilders();
}

ESQueryBuilderConstructor.java

package com.hsiehchou.es.V2;

import org.apache.commons.collections.CollectionUtils;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;

import java.util.ArrayList;
import java.util.List;

/**
 * 查询条件容器
 */
public class ESQueryBuilderConstructor {

    private int size = Integer.MAX_VALUE;

    private int from = 0;

    private String asc;

    private String desc;

    //查询条件容器
    private List<ESCriterion> mustCriterions = new ArrayList<ESCriterion>();
    private List<ESCriterion> shouldCriterions = new ArrayList<ESCriterion>();
    private List<ESCriterion> mustNotCriterions = new ArrayList<ESCriterion>();

    //构造builder
    public QueryBuilder listBuilders() {
        int count = mustCriterions.size() + shouldCriterions.size() + mustNotCriterions.size();
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        QueryBuilder queryBuilder = null;

        if (count >= 1) {
            //must容器
            if (!CollectionUtils.isEmpty(mustCriterions)) {
                for (ESCriterion criterion : mustCriterions) {
                    for (QueryBuilder builder : criterion.listBuilders()) {
                        queryBuilder = boolQueryBuilder.must(builder);
                    }
                }
            }

            //should容器
            if (!CollectionUtils.isEmpty(shouldCriterions)) {
                for (ESCriterion criterion : shouldCriterions) {
                    for (QueryBuilder builder : criterion.listBuilders()) {
                        queryBuilder = boolQueryBuilder.should(builder);
                    }
                }
            }

            //must not 容器
            if (!CollectionUtils.isEmpty(mustNotCriterions)) {
                for (ESCriterion criterion : mustNotCriterions) {
                    for (QueryBuilder builder : criterion.listBuilders()) {
                        queryBuilder = boolQueryBuilder.mustNot(builder);
                    }
                }
            }
            return queryBuilder;
        } else {
            return null;
        }
    }

    /**
     * 增加简单条件表达式
     */
    public ESQueryBuilderConstructor must(ESCriterion criterion){
        if(criterion!=null){
            mustCriterions.add(criterion);
        }
        return this;
    }

    /**
     * 增加简单条件表达式
     */
    public ESQueryBuilderConstructor should(ESCriterion criterion){
        if(criterion!=null){
            shouldCriterions.add(criterion);
        }
        return this;
    }

    /**
     * 增加简单条件表达式
     */
    public ESQueryBuilderConstructor mustNot(ESCriterion criterion){
        if(criterion!=null){
            mustNotCriterions.add(criterion);
        }
        return this;
    }


    public int getSize() {
        return size;
    }

    public void setSize(int size) {
        this.size = size;
    }

    public String getAsc() {
        return asc;
    }

    public void setAsc(String asc) {
        this.asc = asc;
    }

    public String getDesc() {
        return desc;
    }

    public void setDesc(String desc) {
        this.desc = desc;
    }

    public int getFrom() {
        return from;
    }

    public void setFrom(int from) {
        this.from = from;
    }
}

ESQueryBuilderConstructorNew.java

package com.hsiehchou.es.V2;

import org.apache.commons.collections.CollectionUtils;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.sort.SortBuilder;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;

/**
 * 查询条件容器
 */
public class ESQueryBuilderConstructorNew {

    private List<String> highLighterFields = new ArrayList<String>();

    private int size = Integer.MAX_VALUE;

    private int from = 0;

    private List<SortBuilder> sortBuilderList;

    public List<SortBuilder> getSortBuilderList() {
        return sortBuilderList;
    }

    public void setSortBuilderList(List<SortBuilder> sortBuilderList) {
        this.sortBuilderList = sortBuilderList;
    }

    private Map<String,List<String>> sortMap;

    //查询条件容器
    private List<ESCriterion> mustCriterions = new ArrayList<ESCriterion>();
    private List<ESCriterion> shouldCriterions = new ArrayList<ESCriterion>();
    private List<ESCriterion> mustNotCriterions = new ArrayList<ESCriterion>();

    //构造builder
    public QueryBuilder listBuilders() {
        int count = mustCriterions.size() + shouldCriterions.size() + mustNotCriterions.size();

        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        QueryBuilder queryBuilder = null;

        if (count >= 1) {
            //must容器
            if (!CollectionUtils.isEmpty(mustCriterions)) {
                for (ESCriterion criterion : mustCriterions) {
                    for (QueryBuilder builder : criterion.listBuilders()) {
                        queryBuilder = boolQueryBuilder.must(builder);
                    }
                }
            }

            //should容器
            if (!CollectionUtils.isEmpty(shouldCriterions)) {
                for (ESCriterion criterion : shouldCriterions) {
                    for (QueryBuilder builder : criterion.listBuilders()) {
                        queryBuilder = boolQueryBuilder.should(builder);
                    }
                }
            }

            //must not 容器
            if (!CollectionUtils.isEmpty(mustNotCriterions)) {
                for (ESCriterion criterion : mustNotCriterions) {
                    for (QueryBuilder builder : criterion.listBuilders()) {
                        queryBuilder = boolQueryBuilder.mustNot(builder);
                    }
                }
            }
            return queryBuilder;
        } else {
            return null;
        }
    }

    /**
     * 增加简单条件表达式
     */
    public ESQueryBuilderConstructorNew must(ESCriterion criterion){
        if(criterion!=null){
            mustCriterions.add(criterion);
        }
        return this;
    }

    /**
     * 增加简单条件表达式
     */
    public ESQueryBuilderConstructorNew should(ESCriterion criterion){
        if(criterion!=null){
            shouldCriterions.add(criterion);
        }
        return this;
    }
    /**
     * 增加简单条件表达式
     */
    public ESQueryBuilderConstructorNew mustNot(ESCriterion criterion){
        if(criterion!=null){
            mustNotCriterions.add(criterion);
        }
        return this;
    }

    public List<String> getHighLighterFields() {
        return highLighterFields;
    }

    public void setHighLighterFields(List<String> highLighterFields) {
        this.highLighterFields = highLighterFields;
    }

    public int getSize() {
        return size;
    }

    public void setSize(int size) {
        this.size = size;
    }

    public Map<String, List<String>> getSortMap() {
        return sortMap;
    }

    public void setSortMap(Map<String, List<String>> sortMap) {
        this.sortMap = sortMap;
    }

    public int getFrom() {
        return from;
    }

    public void setFrom(int from) {
        this.from = from;
    }
}

ESQueryBuilders.java

package com.hsiehchou.es.V2;

import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.NestedQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;

import java.util.ArrayList;
import java.util.Collection;
import java.util.List;

/**
 * 条件构造器
 */
public class ESQueryBuilders implements ESCriterion{

    private List<QueryBuilder> list = new ArrayList<QueryBuilder>();

    /**
     * 功能描述：match 查询
     * @param field 字段名
     * @param value 值
     */
    public ESQueryBuilders match(String field, Object value) {
        list.add(new ESSimpleExpression (field, value, Operator.MATCH).toBuilder());
        return this;
    }

    /**
     * 功能描述：match 查询
     * @param field 字段名
     * @param value 值
     */
    public ESQueryBuilders match_phrase(String field, Object value) {
        list.add(new ESSimpleExpression (field, value, Operator.MATCH_PHRASE).toBuilder());
        return this;
    }

    /**
     * 功能描述：match 查询
     * @param fieldNames 字段名
     * @param value 值
     */
    public ESQueryBuilders multi_match(Object value , String... fieldNames ) {
        String[] fields = fieldNames;
        list.add(new ESSimpleExpression (value, Operator.MULTI_MATCH,fields).toBuilder());
        return this;
    }

    /**
     * 功能描述：Term 查询
     * @param field 字段名
     * @param value 值
     */
    public ESQueryBuilders term(String field, Object value) {
        list.add(new ESSimpleExpression (field, value, Operator.TERM).toBuilder());
        return this;
    }

    /**
     * 功能描述：Terms 查询
     * @param field 字段名
     * @param values 集合值
     */
    public ESQueryBuilders terms(String field, Collection<Object> values) {
        list.add(new ESSimpleExpression (field, values).toBuilder());
        return this;
    }

    /**
     * 功能描述：fuzzy 查询
     * @param field 字段名
     * @param value 值
     */
    public ESQueryBuilders fuzzy(String field, Object value) {
        list.add(new ESSimpleExpression (field, value, Operator.FUZZY).toBuilder());
        return this;
    }

    /**
     * 功能描述：Range 查询
     * @param from 起始值
     * @param to 末尾值
     */
    public ESQueryBuilders range(String field, Object from, Object to) {
        list.add(new ESSimpleExpression (field, from, to).toBuilder());
        return this;
    }

    /**
     * 功能描述：GTE 大于等于查询
     * @param
     */
    public ESQueryBuilders gte(String field, Object num) {
        list.add(new ESSimpleExpression (field, num,Operator.GTE).toBuilder());
        return this;
    }

    /**
     * 功能描述：LTE 小于等于查询
     * @param
     */
    public ESQueryBuilders lte(String field, Object num) {
        list.add(new ESSimpleExpression (field, num,Operator.LTE).toBuilder());
        return this;
    }

    /**
     * 功能描述：prefix 查询
     * @param field 字段名
     * @param value 值
     */
    public ESQueryBuilders prefix(String field, Object value) {
        list.add(new ESSimpleExpression (field, value, Operator.PREFIX).toBuilder());
        return this;
    }

    /**
     * 功能描述：Range 查询
     * @param queryString 查询语句
     */
    public ESQueryBuilders queryString(String queryString) {
        list.add(new ESSimpleExpression (queryString, Operator.QUERY_STRING).toBuilder());
        return this;
    }

    /**
     * 功能描述：Range 查询
     * @param
     */
    public ESQueryBuilders bool(BoolQueryBuilder boolQueryBuilder) {
        list.add(boolQueryBuilder);
        return this;
    }

    public ESQueryBuilders nested(NestedQueryBuilder nestedQueryBuilder) {
        list.add(nestedQueryBuilder);
        return this;
    }

    public List<QueryBuilder> listBuilders() {
        return list;
    }
}

ESSimpleExpression.java

package com.hsiehchou.es.V2;

import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;

import java.util.Collection;
import com.hsiehchou.es.V2.ESCriterion.Operator;

import static org.elasticsearch.index.search.MatchQuery.Type.PHRASE;

/**
 * 条件表达式
 */
public class ESSimpleExpression {

    private String[] fieldNames;         //属性名
    private String fieldName;         //属性名
    private Object value;             //对应值
    private Collection<Object> values;//对应值
    private Operator operator;        //计算符
    private Object from;
    private Object to;

    protected  ESSimpleExpression() {
    }

    protected  ESSimpleExpression(Object value, Operator operator,String... fieldNames) {
        this.fieldNames = fieldNames;
        this.value = value;
        this.operator = operator;
    }


    protected  ESSimpleExpression(String fieldName, Object value, Operator operator) {
        this.fieldName = fieldName;
        this.value = value;
        this.operator = operator;
    }

    protected  ESSimpleExpression(String value, Operator operator) {
        this.value = value;
        this.operator = operator;
    }

    protected ESSimpleExpression(String fieldName, Collection<Object> values) {
        this.fieldName = fieldName;
        this.values = values;
        this.operator = Operator.TERMS;
    }

    protected ESSimpleExpression(String fieldName, Object from, Object to) {
        this.fieldName = fieldName;
        this.from = from;
        this.to = to;
        this.operator = Operator.RANGE;
    }

    public BoolQueryBuilder toBoolQueryBuilder(){
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        boolQueryBuilder.mustNot(QueryBuilders.matchQuery("",""));
        boolQueryBuilder.mustNot(QueryBuilders.matchQuery("",""));

        return null;
    }

    public QueryBuilder toBuilder() {
        QueryBuilder qb = null;
        switch (operator) {
            case MATCH:
                qb = QueryBuilders.matchQuery(fieldName, value);
                break;
            case MATCH_PHRASE:
                qb = QueryBuilders.matchPhraseQuery(fieldName, value);
                break;
            case MULTI_MATCH:
                qb = QueryBuilders.multiMatchQuery(value,fieldNames).type(PHRASE);
                break;
            case TERM:
                qb = QueryBuilders.termQuery(fieldName, value);
                break;
            case TERMS:
                qb = QueryBuilders.termsQuery(fieldName, values);
                break;
            case RANGE:
                qb = QueryBuilders.rangeQuery(fieldName).from(from).to(to).includeLower(true).includeUpper(true);
                break;
            case GTE:
                qb = QueryBuilders.rangeQuery(fieldName).gte(value);
                break;
            case LTE:
                qb = QueryBuilders.rangeQuery(fieldName).lte(value);
                break;
            case FUZZY:
                qb = QueryBuilders.fuzzyQuery(fieldName, value);
                break;
            case PREFIX:
                qb = QueryBuilders.prefixQuery(fieldName, value.toString());
                break;
            case QUERY_STRING:
                qb = QueryBuilders.queryStringQuery(value.toString());
                default:
        }
        return qb;
    }
}

九、预警

通过后台或者界面设置规则，保存到mysql，然后同步到redis。

数据量大的话，用mysql是非常慢的，使用内存数据库redis进行规则缓存，使用时直接比对预警。

预警流程

预警过程

MySQL 需要2张表
一张是规则表用来存储规则
一张是消息表存储告警消息

1、创建规则表（由界面控制规则发布）

规则首先存放在mysql中，会使用一个定时任务将mysql中的规则同步到redis
直接在test库中创建
创建脚本
xz_rule.sql

SET FOREIGN_KEY_CHECKS=0;

DROP TABLE IF EXISTS `xz_rule`;
CREATE TABLE `xz_rule` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `warn_fieldname` varchar(20) DEFAULT NULL,
  `warn_fieldvalue` varchar(255) DEFAULT NULL,
  `publisher` varchar(255) DEFAULT NULL,
  `send_type` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `send_mobile` varchar(255) DEFAULT NULL,
  `send_mail` varchar(255) DEFAULT NULL,
  `send_dingding` varchar(255) DEFAULT NULL,
  `create_time` date DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=latin1;

INSERT INTO `xz_rule` VALUES ('1', 'phone', '18609765432', '?????1', '2', '13724536789', '1782324@qq.com', '32143243', '2019-06-28');

2、创建消息表

用于存放预警的消息，供界面定时刷新预警消息或者是滚屏预警
预警消息统计

warn_message.sql

SET FOREIGN_KEY_CHECKS=0;

DROP TABLE IF EXISTS `warn_message`;
CREATE TABLE `warn_message` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `alarmRuleid` varchar(255) DEFAULT NULL,
  `alarmType` varchar(255) DEFAULT NULL,
  `sendType` varchar(255) DEFAULT NULL,
  `sendMobile` varchar(255) DEFAULT NULL,
  `sendEmail` varchar(255) DEFAULT NULL,
  `sendStatus` varchar(255) DEFAULT NULL,
  `senfInfo` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `hitTime` datetime DEFAULT NULL,
  `checkinTime` datetime DEFAULT NULL,
  `isRead` varchar(255) DEFAULT NULL,
  `readAccounts` varchar(255) DEFAULT NULL,
  `alarmaccounts` varchar(255) DEFAULT NULL,
  `accountid` varchar(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=31 DEFAULT CHARSET=latin1;

3、创建数据库连接工具类

新建com.hsiehchou.common.netb.db包
创建DBCommon类

DBCommon.java

package com.hsiehchou.common.netb.db;

import com.hsiehchou.common.config.ConfigUtil;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.sql.*;
import java.util.Properties;

public class DBCommon {

    private static Logger LOG = LoggerFactory.getLogger(DBCommon.class);
    private static String MYSQL_PATH = "common/mysql.properties";
    private static Properties properties = ConfigUtil.getInstance().getProperties(MYSQL_PATH);

    private static Connection conn ;
    private DBCommon(){}

    public static void main(String[] args) {
        System.out.println(properties);
        Connection xz_bigdata = DBCommon.getConn("test");
        System.out.println(xz_bigdata);
    }

    //TODO  配置文件
    private static final String JDBC_DRIVER = "com.mysql.jdbc.Driver";
    private static final String USER_NAME = properties.getProperty("user");
    private static final String PASSWORD = properties.getProperty("password");
    private static final String IP = properties.getProperty("db_ip");
    private static final String PORT = properties.getProperty("db_port");
    private static final String DB_CONFIG = "?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull&autoReconnect=true&failOverReadOnly=false";

    static {
        try {
            Class.forName(JDBC_DRIVER);
        } catch (ClassNotFoundException e) {
            LOG.error(null, e);
        }
    }

    /**
     * 获取数据库连接
     * @param dbName
     * @return
     */
    public static Connection getConn(String dbName) {
        Connection conn = null;
        String  connstring = "jdbc:mysql://"+IP+":"+PORT+"/"+dbName+DB_CONFIG;
        try {
            conn = DriverManager.getConnection(connstring, USER_NAME, PASSWORD);
        } catch (SQLException e) {
            e.printStackTrace();
            LOG.error(null, e);
        }
        return conn;
    }

    /**
     * @param url eg:"jdbc:oracle:thin:@172.16.1.111:1521:d406"
     * @param driver eg:"oracle.jdbc.driver.OracleDriver"
     * @param user eg:"ucase"
     * @param password eg:"ucase123"
     * @return
     * @throws ClassNotFoundException
     * @throws SQLException
     */
    public static Connection getConn(String url, String driver, String user,
                                     String password) throws ClassNotFoundException, SQLException{
        Class.forName(driver);
        conn = DriverManager.getConnection(url, user, password);
        return  conn;
    }

    public static void close(Connection conn){
        try {
            if( conn != null ){
                conn.close();
            }
        } catch (SQLException e) {
            LOG.error(null,e);
        }
    }

    public static void close(Statement statement){
        try {
            if( statement != null ){
                statement.close();
            }
        } catch (SQLException e) {
            LOG.error(null,e);
        }
    }

    public static void close(Connection conn,PreparedStatement statement){
        try {
            if( conn != null ){
                conn.close();
            }
            if( statement != null ){
                statement.close();
            }
        } catch (SQLException e) {
            LOG.error(null,e);
        }
    }

    public static void close(Connection conn,Statement statement,ResultSet resultSet) throws SQLException{

        if( resultSet != null ){
            resultSet.close();
        }
        if( statement != null ){
            statement.close();
        }
        if( conn != null ){
            conn.close();
        }
    }
}

引入maven依赖

<dependency>
    <groupId>commons-dbutils</groupId>
    <artifactId>commons-dbutils</artifactId>
    <version>${commons-dbutils.version}</version>
</dependency>

4、创建实体类和dao

新建com.hsiehchou.spark.warn.domain包
新建 XZ_RuleDomain，WarningMessage

XZ_RuleDomain.java

package com.hsiehchou.spark.warn.domain;

import java.sql.Date;

public class XZ_RuleDomain {

    private int id;
    private String warn_fieldname;   //预警字段
    private String warn_fieldvalue; //预警内容
    private String publisher;       //发布者
    private String send_type;       //消息接收方式
    private String send_mobile;     //接收手机号
    private String send_mail;       //接收邮箱
    private String send_dingding;   //接收钉钉
    private Date create_time;       //创建时间


    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public String getWarn_fieldname() {
        return warn_fieldname;
    }

    public void setWarn_fieldname(String warn_fieldname) {
        this.warn_fieldname = warn_fieldname;
    }

    public String getWarn_fieldvalue() {
        return warn_fieldvalue;
    }

    public void setWarn_fieldvalue(String warn_fieldvalue) {
        this.warn_fieldvalue = warn_fieldvalue;
    }

    public String getPublisher() {
        return publisher;
    }

    public void setPublisher(String publisher) {
        this.publisher = publisher;
    }

    public String getSend_type() {
        return send_type;
    }

    public void setSend_type(String send_type) {
        this.send_type = send_type;
    }

    public String getSend_mobile() {
        return send_mobile;
    }

    public void setSend_mobile(String send_mobile) {
        this.send_mobile = send_mobile;
    }

    public String getSend_mail() {
        return send_mail;
    }

    public void setSend_mail(String send_mail) {
        this.send_mail = send_mail;
    }

    public String getSend_dingding() {
        return send_dingding;
    }

    public void setSend_dingding(String send_dingding) {
        this.send_dingding = send_dingding;
    }

    public Date getCreate_time() {
        return create_time;
    }

    public void setCreate_time(Date create_time) {
        this.create_time = create_time;
    }
}

WarningMessage.java

package com.hsiehchou.spark.warn.domain;

import java.sql.Date;

public class WarningMessage {
    private String id;            //主键id
    private String alarmRuleid;   //规则id
    private String alarmType;     //告警类型
    private String sendType;      //发送方式
    private String sendMobile;    //发送至手机
    private String sendEmail;     //发送至邮箱
    private String sendStatus;    //发送状态
    private String senfInfo;      //发送内容
    private Date hitTime;         //命中时间
    private Date checkinTime;     //入库时间
    private String isRead;        //是否已读
    private String readAccounts;  //已读用户
    private String alarmaccounts;
    private String accountid;

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getAlarmRuleid() {
        return alarmRuleid;
    }

    public void setAlarmRuleid(String alarmRuleid) {
        this.alarmRuleid = alarmRuleid;
    }

    public String getAlarmType() {
        return alarmType;
    }

    public void setAlarmType(String alarmType) {
        this.alarmType = alarmType;
    }

    public String getSendType() {
        return sendType;
    }

    public void setSendType(String sendType) {
        this.sendType = sendType;
    }

    public String getSendMobile() {
        return sendMobile;
    }

    public void setSendMobile(String sendMobile) {
        this.sendMobile = sendMobile;
    }

    public String getSendEmail() {
        return sendEmail;
    }

    public void setSendEmail(String sendEmail) {
        this.sendEmail = sendEmail;
    }

    public String getSendStatus() {
        return sendStatus;
    }

    public void setSendStatus(String sendStatus) {
        this.sendStatus = sendStatus;
    }

    public String getSenfInfo() {
        return senfInfo;
    }

    public void setSenfInfo(String senfInfo) {
        this.senfInfo = senfInfo;
    }

    public Date getHitTime() {
        return hitTime;
    }

    public void setHitTime(Date hitTime) {
        this.hitTime = hitTime;
    }

    public Date getCheckinTime() {
        return checkinTime;
    }

    public void setCheckinTime(Date checkinTime) {
        this.checkinTime = checkinTime;
    }

    public String getIsRead() {
        return isRead;
    }

    public void setIsRead(String isRead) {
        this.isRead = isRead;
    }

    public String getReadAccounts() {
        return readAccounts;
    }

    public void setReadAccounts(String readAccounts) {
        this.readAccounts = readAccounts;
    }

    public String getAlarmaccounts() {
        return alarmaccounts;
    }

    public void setAlarmaccounts(String alarmaccounts) {
        this.alarmaccounts = alarmaccounts;
    }

    public String getAccountid() {
        return accountid;
    }

    public void setAccountid(String accountid) {
        this.accountid = accountid;
    }

    @Override
    public String toString() {
        return "WarningMessage{" +
                "id='" + id + '\'' +
                ", alarmRuleid='" + alarmRuleid + '\'' +
                ", alarmType='" + alarmType + '\'' +
                ", sendType='" + sendType + '\'' +
                ", sendMobile='" + sendMobile + '\'' +
                ", sendEmail='" + sendEmail + '\'' +
                ", sendStatus='" + sendStatus + '\'' +
                ", senfInfo='" + senfInfo + '\'' +
                ", hitTime=" + hitTime +
                ", checkinTime=" + checkinTime +
                ", isRead='" + isRead + '\'' +
                ", readAccounts='" + readAccounts + '\'' +
                ", alarmaccounts='" + alarmaccounts + '\'' +
                ", accountid='" + accountid + '\'' +
                '}';
    }
}

新建com.hsiehchou.spark.warn.dao包
新建 XZ_RuleDao，WarningMessageDao

XZ_RuleDao.java

package com.hsiehchou.spark.warn.dao;

import com.hsiehchou.common.netb.db.DBCommon;
import com.hsiehchou.spark.warn.domain.XZ_RuleDomain;
import org.apache.commons.dbutils.QueryRunner;
import org.apache.commons.dbutils.handlers.BeanListHandler;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.sql.Connection;
import java.sql.SQLException;
import java.util.List;

public class XZ_RuleDao {

    private static final Logger LOG = LoggerFactory.getLogger(XZ_RuleDao.class);

    /**
     *  获取所有的规则
     * @return
     */
    public static List<XZ_RuleDomain> getRuleList(){
        List<XZ_RuleDomain> listRules = null;

        //获取连接
        Connection conn = DBCommon.getConn("test");

        //执行器
        QueryRunner query = new QueryRunner();
        String sql = "select * from xz_rule";
        try {
            listRules = query.query(conn,sql,new BeanListHandler<>(XZ_RuleDomain.class));
        } catch (SQLException e) {
            LOG.error(null,e);
        }finally {
            DBCommon.close(conn);
        }
        return listRules;
    }

    public static void main(String[] args) {
        List<XZ_RuleDomain> ruleList = XZ_RuleDao.getRuleList();
        System.out.println(ruleList.size());
        ruleList.forEach(x->{
            System.out.println(x);
        });
    }
}

WarningMessageDao.java

package com.hsiehchou.spark.warn.dao;

import com.hsiehchou.common.netb.db.DBCommon;
import com.hsiehchou.spark.warn.domain.WarningMessage;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.sql.*;

public class WarningMessageDao {

    private static final Logger LOG = LoggerFactory.getLogger(WarningMessageDao.class);

    /**
     * 写入消息到mysql
     * @param warningMessage
     * @return
     */
    public static Integer insertWarningMessageReturnId(WarningMessage warningMessage) {
        Connection conn= DBCommon.getConn("test");
        String sql="insert into warn_message(alarmruleid,sendtype,senfinfo,hittime,sendmobile,alarmtype) " +
                "values(?,?,?,?,?,?)";

        PreparedStatement stmt=null;
        ResultSet resultSet=null;
        int id=-1;
        try{
            stmt = conn.prepareStatement(sql);
            stmt.setString(1,warningMessage.getAlarmRuleid());
            stmt.setInt(2,Integer.valueOf(warningMessage.getSendType()));
            stmt.setString(3,warningMessage.getSenfInfo());
            stmt.setTimestamp(4,new Timestamp(System.currentTimeMillis()));
            stmt.setString(5,warningMessage.getSendMobile());
            stmt.setInt(6,Integer.valueOf(warningMessage.getAlarmType()));
            stmt.executeUpdate();
        }catch(Exception e) {
            LOG.error(null,e);
        }finally {
            try {
                DBCommon.close(conn,stmt,resultSet);
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
        return id;
    }
}

5、告警工具类

新建com.hsiehchou.spark.warn.service包
新建 BlackRuleWarning，WarningMessageSendUtil

BlackRuleWarning.java

package com.hsiehchou.spark.warn.service;

import com.hsiehchou.spark.warn.dao.WarningMessageDao;
import com.hsiehchou.spark.warn.domain.WarningMessage;
import org.apache.commons.lang3.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import redis.clients.jedis.Jedis;

import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

public class BlackRuleWarning {
    private static final Logger LOG = LoggerFactory.getLogger(BlackRuleWarning.class);
    //可以通过数据库，配置文件加载

    //为了遍历所有预警字段
    private static List<String> listWarnFields = new ArrayList<>();

    static {
        listWarnFields.add("phone");
        listWarnFields.add("mac");
    }

    /**
     * 预警流程处理
     * @param map
     * @param jedis15
     */
    public static void blackWarning(Map<String, Object> map, Jedis jedis15) {

        listWarnFields.forEach(warnField -> {
            if (map.containsKey(warnField) && StringUtils.isNotBlank(map.get(warnField).toString())) {

                //获取预警字段核预警值  相当于手机号
                String warnFieldValue = map.get(warnField).toString();

                //去redis中进行比对
                //数据中  通过   "字段" + "字段值" 去拼接key
                //            phone       :    186XXXXXX
                String key = warnField + ":" + warnFieldValue;

                //redis中的key是   phone:18609765435
                System.out.println("拼接数据流中的key=======" + key);
                if (jedis15.exists(key)) {
                    //对比命中之后 就可以发送消息提醒
                    System.out.println("命中REDIS中的" + key + "===========开始预警");
                    beginWarning(jedis15, key);
                } else {
                    //直接过
                    System.out.println("未命中" + key + "===========不进行预警");
                }
            }
        });
    }

    /**
     * 规则已经命中，开始预警
     * @param jedis15
     * @param key
     */
    private static void beginWarning( Jedis jedis15, String key) {

        System.out.println("============MESSAGE -1- =========");
        //封装告警  信息及告警消息
        WarningMessage warningMessage = getWarningMessage(jedis15, key);


        System.out.println("============MESSAGE -4- =========");
        if (warningMessage != null) {
            //将预警信息写入预警信息表
            WarningMessageDao.insertWarningMessageReturnId(warningMessage);
            //String accountid = warningMessage.getAccountid();
            //String readAccounts = warningMessage.getAlarmaccounts();
            // WarnService.insertRead_status(messageId, accountid);
            if (warningMessage.getSendType().equals("2")) {
                //手机短信告警 默认告警方式
                WarningMessageSendUtil.messageWarn(warningMessage);
            }
        }
    }

    /**
     * 封装告警信息及告警消息
     * @param jedis15
     * @param key
     * @return
     */
    private static WarningMessage getWarningMessage(Jedis jedis15, String key) {
        System.out.println("============MESSAGE -2- =========");
        //封装消息
        String[] split = key.split(":");
        if (split.length == 2) {
            WarningMessage warningMessage = new WarningMessage();
            String time = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").toString();
            String clew_type = split[0];//告警字段
            String rulecontent = split[1];//告警字段值

            //从redis中获取消息信息进行封装
            Map<String, String> valueMap = jedis15.hgetAll(key);

            //规则ID (是哪条规则命中的)
            warningMessage.setAlarmRuleid(valueMap.get("id"));

            //预警方式
            warningMessage.setSendType(valueMap.get("send_type"));//告警方式，0：界面 1：邮件 2：短信 3：邮件+短信

            //预警信息接收手机号
            warningMessage.setSendMobile(valueMap.get("send_mobile"));

            //arningMessage.setSendEmail(valueMap.get("sendemail"));
            /*arningMessage.setAlarmaccounts(valueMap.get("alarmaccounts"));*/
            //规则发布人
            warningMessage.setAccountid(valueMap.get("publisher"));
            warningMessage.setAlarmType("2");
            StringBuffer warn_content = new StringBuffer();

            //预警内容 信息   时间  地点  人物
            //预警字段来进行设置  phone
            //我们有手机号


            //数据关联
            // 手机  MAC  身份证， 车牌  人脸。。URL 姓名
            // 全部设在推送消息里面
            warn_content.append("【网络告警】：手机号为:" + "[" + rulecontent + "]在时间" + time + "出现在" + ">附近,设备号"
            );
            String content = warn_content.toString();
            warningMessage.setSenfInfo(content);
            System.out.println("============MESSAGE -3- =========");
            return warningMessage;
        } else {
            return null;
        }
    }
}

WarningMessageSendUtil.java

package com.hsiehchou.spark.warn.service;

import com.hsiehchou.common.regex.Validation;
import com.hsiehchou.spark.warn.domain.WarningMessage;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class WarningMessageSendUtil {
    private static final Logger LOG = LoggerFactory.getLogger(WarningMessageSendUtil.class);

    public static void messageWarn(WarningMessage warningMessage) {

        String[] mobiles = warningMessage.getSendMobile().split(",");

        for(String phone:mobiles){
            if(Validation.isMobile(phone)){
                System.out.println("开始向手机号为" + phone + "发送告警消息====" + warningMessage);
                StringBuffer sb= new StringBuffer();
                String content=warningMessage.getSenfInfo().toString();
                //TODO  调用短信接口发送消息
                //TODO  怎么通过短信发送  这个是需要公司开通接口
                //TODO  DINGDING
                // 专门的接口
             /*   sb.append(ClusterProperties.https_url + "username=" + ClusterProperties.https_username +
                        "&password=" + ClusterProperties.https_password + "&mobile=" + phone +
                        "&apikey=" + ClusterProperties.https_apikey+
                        "&content=" + URLEncoder.encode(content));*/
               // sendMessage(sb.toString());
            }
        }
    }
}

6、创建redis子项目

操作redis 使用

新建xz_bigdata_redis子模块

pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata2</artifactId>
        <groupId>com.hsiehchou</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_redis</artifactId>

    <name>xz_bigdata_redis</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <jedis.version>2.7.0</jedis.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_resources</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>redis.clients</groupId>
            <artifactId>jedis</artifactId>
            <version>${jedis.version}</version>
        </dependency>
    </dependencies>
</project>

新建com.hsiehchou.redis.client包
创建redis连接类—JedisSingle

JedisSingle.java

package com.hsiehchou.redis.client;

import com.hsiehchou.common.config.ConfigUtil;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import redis.clients.jedis.Jedis;
import redis.clients.jedis.exceptions.JedisConnectionException;

import java.net.SocketTimeoutException;
import java.util.Map;
import java.util.Properties;

public class JedisSingle {

    private static final Logger LOG = LoggerFactory.getLogger(JedisSingle.class);
    private static Properties redisConf;

    /**
     * 读取redis配置文件
     * redis.hostname = 192.168.247.103
     * redis.port  = 6379
     */
    static {
        redisConf = ConfigUtil.getInstance().getProperties("redis/redis.properties");
        System.out.println(redisConf);
    }

    public static Jedis getJedis(int db){
        Jedis jedis = JedisSingle.getJedis();
        if(jedis!=null){
            jedis.select(db);
        }
        return jedis;
    }

    public static void main(String[] args) {
        Jedis jedis = JedisSingle.getJedis(15);
        Map<String, String> Map = jedis.hgetAll("phone:18609765435");
        System.out.println(Map.toString());
    }

    public static Jedis getJedis(){
        int timeoutCount = 0;
        while (true) {// 如果是网络超时则多试几次
            try
            {
                 Jedis jedis = new Jedis(redisConf.get("redis.hostname").toString(),
                         Integer.valueOf(redisConf.get("redis.port").toString()));
                return jedis;
            } catch (Exception e)
            {
                if (e instanceof JedisConnectionException || e instanceof SocketTimeoutException)
                {
                    timeoutCount++;
                    LOG.warn("获取jedis连接超时次数:" +timeoutCount);
                    if (timeoutCount > 4)
                    {
                        LOG.error("获取jedis连接超时次数a:" +timeoutCount);
                        LOG.error(null,e);
                        break;
                    }
                }else
                {
                    LOG.error("getJedis error", e);
                    break;
                }
            }
        }
        return null;
    }

    public static void close(Jedis jedis){
        if(jedis!=null){
            jedis.close();
        }
    }
}

7、创建定时任务，将规则同步到redis

新建 com.hsiehchou.spark.warn.timer 包
新建 SyncRule2Redis，WarnHelper

SyncRule2Redis.java

package com.hsiehchou.spark.warn.timer;

import java.util.TimerTask;

public class SyncRule2Redis extends TimerTask {
    @Override
    public void run() {
        //这里定义同步方法
        //就是读取mysql的数据 然后写入到redis中
        System.out.println("========开始同步MYSQL规则到redis=======");
        WarnHelper.syncRuleFromMysql2Redis();
        System.out.println("============开始同步规则成功===========");
    }
}

WarnHelper.java

package com.hsiehchou.spark.warn.timer;

import com.hsiehchou.redis.client.JedisSingle;
import com.hsiehchou.spark.warn.dao.XZ_RuleDao;
import com.hsiehchou.spark.warn.domain.XZ_RuleDomain;
import org.apache.commons.lang3.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import redis.clients.jedis.Jedis;

import java.util.List;

public class WarnHelper {

    private static final Logger LOG = LoggerFactory.getLogger(WarnHelper.class);

    /**
     * 同步mysql规则数据到redis
     */
    public static void syncRuleFromMysql2Redis(){
        //获取所有的规则
        List<XZ_RuleDomain> ruleList = XZ_RuleDao.getRuleList();
        Jedis jedis = null;
        try {
            //获取redis 客户端
            jedis = JedisSingle.getJedis(15);
            for (int i = 0; i < ruleList.size(); i++) {
                XZ_RuleDomain rule = ruleList.get(i);
                String id = rule.getId()+"";
                String publisher = rule.getPublisher();
                String warn_fieldname = rule.getWarn_fieldname();
                String warn_fieldvalue = rule.getWarn_fieldvalue();
                String send_mobile = rule.getSend_mobile();
                String send_type = rule.getSend_type();

                //拼接redis key值
                String redisKey = warn_fieldname +":" + warn_fieldvalue;

                //通过redis hash结构   hashMap
                jedis.hset(redisKey,"id",StringUtils.isNoneBlank(id) ? id : "");
                jedis.hset(redisKey,"publisher",StringUtils.isNoneBlank(publisher) ? publisher : "");
                jedis.hset(redisKey,"warn_fieldname",StringUtils.isNoneBlank(warn_fieldname) ? warn_fieldname : "");
                jedis.hset(redisKey,"warn_fieldvalue",StringUtils.isNoneBlank(warn_fieldvalue) ? warn_fieldvalue : "");
                jedis.hset(redisKey,"send_mobile",StringUtils.isNoneBlank(send_mobile) ? send_mobile : "");
                jedis.hset(redisKey,"send_type",StringUtils.isNoneBlank(send_type) ? send_type : "");
            }
        } catch (Exception e) {
           LOG.error("同步规则到es失败",e);
        } finally {
            JedisSingle.close(jedis);
        }
    }

    public static void main(String[] args)
    {
        WarnHelper.syncRuleFromMysql2Redis();
    }
}

8、创建streaming流任务

scala/com/hsiehchou/spark/streaming/kafka/warn
WarningStreamingTask.scala

package com.hsiehchou.spark.streaming.kafka.warn

import java.util.Timer

import com.hsiehchou.redis.client.JedisSingle
import com.hsiehchou.spark.common.SparkContextFactory
import com.hsiehchou.spark.streaming.kafka.Spark_Kafka_ConfigUtil
import com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming.kafkaConfig
import com.hsiehchou.spark.warn.service.BlackRuleWarning
import com.hsiehchou.spark.warn.timer.SyncRule2Redis
import org.apache.spark.Logging
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.kafka.KafkaManager
import redis.clients.jedis.Jedis

object WarningStreamingTask extends Serializable with Logging{

  def main(args: Array[String]): Unit = {

     //定义一个定时器去定时同步 MYSQL到REDIS
     val timer : Timer = new Timer

    //SyncRule2Redis 任务类
    //0 第一次开始执行
    //1*60*1000  隔多少时间执行一次
    timer.schedule(new SyncRule2Redis,0,1*60*1000)

     //从kafka中获取数据流
     //val topics = "chl_test7".split(",")
     //kafka topic
     val topics = "chl_test7".split(",")

     //val ssc = SparkContextFactory.newSparkLocalStreamingContext("WarningStreamingTask1", java.lang.Long.valueOf(10),1)
     val ssc:StreamingContext = SparkContextFactory.newSparkStreamingContext("Kafka2esStreaming", java.lang.Long.valueOf(10))

    //构建kafkaManager
    val kafkaManager = new KafkaManager(
      Spark_Kafka_ConfigUtil.getKafkaParam(kafkaConfig.getProperty("metadata.broker.list"), "WarningStreamingTask111")
    )
    //使用kafkaManager创建DStreaming流
    val kafkaDS = kafkaManager.createJsonToJMapStringDirectStreamWithOffset(ssc, topics.toSet)
      //添加一个日期分组字段
      //如果数据其他的转换，可以先在这里进行统一转换
       .persist(StorageLevel.MEMORY_AND_DISK)


    kafkaDS.foreachRDD(rdd=>{

      //流量预警
      //if(!rdd.isEmpty()){
/*      val count_flow = rdd.map(x=>{
          val flow = java.lang.Long.valueOf(x.get("collect_time"))
          flow
        }).reduce(_+_)
      if(count_flow > 1719179595L){
        println("流量预警: 阈值[1719179595L] 实际值:"+ count_flow)
      }*/
      //}

      //客户端连接之类的 最好不要放在RDD外面，因为在处理partion时，数据需要分发到各个节点上去
      //数据分发必须需要序列化才可以，如果不能序列化，分发会报错
      //如果这个数据 包括他里面的内容 都可以序列化，那么可以直接放在RDD外面
      var jedis:Jedis = null
      try {
        //jedis = JedisSingle.getJedis(15)
        rdd.foreachPartition(partion => {
          jedis = JedisSingle.getJedis(15)
          while (partion.hasNext) {
            val map = partion.next()
            val table = map.get("table")
            val mapObject = map.asInstanceOf[java.util.Map[String,Object]]
            println(table)
            //开始比对
            BlackRuleWarning.blackWarning(mapObject,jedis)
          }
        })
      } catch {
        case e => e.printStackTrace()
      } finally {
        JedisSingle.close(jedis)
      }


 /*       rdd.foreachPartition(partion => {
          var jedis: Jedis = null
          try {
            jedis = JedisSingle.getJedis(15)
            while (partion.hasNext) {
              val map = partion.next()
              val mapObject = map.asInstanceOf[java.util.Map[String, Object]]
              //开始比对
              BlackRuleWarning.blackWarning(mapObject, jedis)
            }
          } catch {
            case e => logError(null,e)
          }finally {
            JedisSingle.close(jedis)
          }
        })*/

    })

    ssc.start()
    ssc.awaitTermination()
  }
}

9、执行

spark-submit --master local[1] --num-executors 1 --driver-memory 300m --executor-memory 500m --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ‘ ‘ ‘,’) --class com.hsiehchou.spark.streaming.kafka.warn.WarningStreamingTask /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar

10、截图

redis安装成功

RedisManager

mysql-xz_rule

发送预警

11、redis安装

解压：tar -zxvf redis-3.0.5.tar.gz
cd redis-3.0.5/
make
make PREFIX=/opt/software/redis install

redis-benchmark ： Redis提供的压力测试工具。模拟产生客户端的压力
redis-check-aof ：检查aof日志文件
redis-check-dump ：检查rdb文件
redis-cli ： Redis客户端脚本
redis-sentinel ：哨兵
redis-server ： Redis服务器脚本

核心配置文件:redis.conf
[root@hsiehchou202 redis-3.0.5]# cp redis.conf /opt/software/redis
[root@hsiehchou202 redis]# mkdir conf
[root@hsiehchou202 redis]# mv redis.conf conf/
[root@hsiehchou202 conf]# vi redis.conf

42行 daemonize yes //后台方式运行
50行 port 6379

启动redis ./bin/redis-server conf/redis.conf

检测是否启动好
[root@hsiehchou202 redis]# bin/redis-server conf/redis.conf

十、Spark—kafka2hive

1、CDH启用Hive on spark

设置 hive on spark 参数
原来的HIVE执行引擎使用的hadoop的mapreduce，Hive on Spark 就是讲执行引擎换为spark 引擎

2、hive配置文件

scala/com/hsiehchou/spark/streaming/kafka/kafka2hdfs/

HiveConfig.scala

package com.hsiehchou.spark.streaming.kafka.kafka2hdfs

import java.util

import org.apache.commons.configuration.{CompositeConfiguration, ConfigurationException, PropertiesConfiguration}
import org.apache.spark.Logging
import org.apache.spark.sql.types.{StringType, StructField, StructType}

import scala.collection.mutable.ArrayBuffer
import scala.collection.JavaConversions._

object HiveConfig extends Serializable with Logging {

  //HIVE 文件根目录
  var hive_root_path = "/apps/hive/warehouse/external/"
  var hiveFieldPath = "es/mapping/fieldmapping.properties"

  var config: CompositeConfiguration = null

  //所有的表
  var tables: util.List[_] = null

  //表对应所有的字段映射,可以通过table名获取 这个table的所有字段
  var tableFieldsMap: util.Map[String, util.HashMap[String, String]] = null

  //StructType
  var mapSchema: util.Map[String, StructType] = null

  //建表语句
  var hiveTableSQL: util.Map[String, String] = null

  /**
    * 主要就是创建mapSchema  和  hiveTableSQL
    */
  initParams()

  def main(args: Array[String]): Unit = {
  }

  /**
    * 初始化HIVE参数
    */
  def initParams(): Unit = {
    //加载es/mapping/fieldmapping.properties 配置文件
    config = HiveConfig.readCompositeConfiguration(hiveFieldPath)
    println("==========================config====================================")

    config.getKeys.foreach(key => {
      println(key + ":" + config.getProperty(key.toString))
    })
    println("==========================tables====================================")
    //wechat,mail,qq
    tables = config.getList("tables")

    tables.foreach(table => {
      println(table)
    })

    var tables1 = config.getProperty("tables")

    println("======================tableFieldsMap================================")
    //(qq,{qq.imsi=string, qq.id=string, qq.send_message=string, qq.filename=string})
    tableFieldsMap = HiveConfig.getKeysByType()
    tableFieldsMap.foreach(x => {
      println(x)
    })
    println("=========================mapSchema===================================")
    mapSchema = HiveConfig.createSchema()
    mapSchema.foreach(x => {
//      val structType = x._2
//      println("-----------")
//      println(structType)
//
//
//      val names = structType.fieldNames
//      names.foreach(field => {
//        println(field)
//      })
      println(x)
    })
    println("=========================hiveTableSQL===================================")
    hiveTableSQL = HiveConfig.getHiveTables()
    hiveTableSQL.foreach(x => {
      println(x)
    })
  }

  /**
    * 读取hive 字段配置文件
    * @param path
    * @return
    */
  def readCompositeConfiguration(path: String): CompositeConfiguration = {
    logInfo("加载配置文件 " + path)
    //多配置工具
    val compositeConfiguration = new CompositeConfiguration
    try {
      val configuration = new PropertiesConfiguration(path)
      compositeConfiguration.addConfiguration(configuration)
    } catch {
      case e: ConfigurationException => {
        logError("加载配置文件 " + path + "失败", e)
      }
    }
    logInfo("加载配置文件" + path + "成功。 ")
    compositeConfiguration
  }

  /**
    * 获取table-字段 对应关系
    * 使用 util.Map[String,util.HashMap[String, String结构保存
    * @return
    */
  def getKeysByType(): util.Map[String, util.HashMap[String, String]] = {

    val map = new util.HashMap[String, util.HashMap[String, String]]()
    println("__________________tables_____________________"+tables)
    //wechat, mail, qq
    val iteratorTable = tables.iterator()

    //对每个表进行遍历
    while (iteratorTable.hasNext) {

      //使用一个MAP保存一种对应关系
      val fieldMap = new util.HashMap[String, String]()

      //获取一个表
      val table: String = iteratorTable.next().toString
      //获取这个表的所有字段
      val fields = config.getKeys(table)
      //获取通用字段  这里暂时没有
      val commonKeys: util.Iterator[String] = config.getKeys("common").asInstanceOf[util.Iterator[String]]

      //将通用字段放到map结构中去
      while (commonKeys.hasNext) {
        val key = commonKeys.next()
        fieldMap.put(key.replace("common", table), config.getString(key))
      }

      //将每种表的私有字段放到map中去
      while (fields.hasNext) {
        val field = fields.next().toString
        fieldMap.put(field, config.getString(field))
        println("__________________field_____________________"+"\n"+field)
      }
      map.put(table, fieldMap)
    }
    map
  }

  /**
    * 构建建表语句
    * 例如CREATE external TABLE IF NOT EXISTS qq (imei string,imsi string,longitude string,latitude string,phone_mac string,device_mac string,device_number string,collect_time string,username string,phone string,object_username string,send_message string,accept_message string,message_time string,id string,table string,filename string,absolute_filename string)
    * @return
    */
  def getHiveTables(): util.Map[String, String] = {

    val hiveTableSqlMap: util.Map[String, String] = new util.HashMap[String, String]()

    //获取没中数据的建表语句
    tables.foreach(table => {

      var sql: String = s"CREATE external TABLE IF NOT EXISTS ${table} ("

      val tableFields = config.getKeys(table.toString)
      tableFields.foreach(tableField => {
        //qq.imsi=string, qq.id=string, qq.send_message=string
        val fieldType = config.getProperty(tableField.toString)
        val field = tableField.toString.split("\\.")(1)
        sql = sql + field
        fieldType match {
          //就是将配置中的类型映射为HIVE 建表语句中的类型
          case "string" => sql = sql + " string,"
          case "long" => sql = sql + " string,"
          case "double" => sql = sql + " string,"
          case _ => println("Nothing Matched!!" + fieldType)
        }
      })
      sql = sql.substring(0, sql.length - 1)
      //sql = sql + s")STORED AS PARQUET location '${hive_root_path}${table}'"
      sql = sql + s") partitioned by(year string,month string,day string) STORED AS PARQUET " + s"location '${hive_root_path}${table}'"
      hiveTableSqlMap.put(table.toString, sql)
    })
    hiveTableSqlMap
  }

  /**
    * 使用tableFieldsMap
    * 对每种类型数据创建对应的Schema
    * @return
    */
  def createSchema(): util.Map[String, StructType] = {
    // schema  表结构
    /*   CREATE TABLE `warn_message` (
         //arrayStructType
         `id` int(11) NOT NULL AUTO_INCREMENT,
         `alarmRuleid` varchar(255) DEFAULT NULL,
         `alarmType` varchar(255) DEFAULT NULL,
         `sendType` varchar(255) DEFAULT NULL,
         `sendMobile` varchar(255) DEFAULT NULL,
         `sendEmail` varchar(255) DEFAULT NULL,
         `sendStatus` varchar(255) DEFAULT NULL,
         `senfInfo` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
         `hitTime` datetime DEFAULT NULL,
         `checkinTime` datetime DEFAULT NULL,
         `isRead` varchar(255) DEFAULT NULL,
         `readAccounts` varchar(255) DEFAULT NULL,
         `alarmaccounts` varchar(255) DEFAULT NULL,
         `accountid` varchar(11) DEFAULT NULL,
         PRIMARY KEY (`id`)
       ) ENGINE=MyISAM AUTO_INCREMENT=528 DEFAULT CHARSET=latin1;*/

    val mapStructType: util.Map[String, StructType] = new util.HashMap[String, StructType]()

    for (table <- tables) {
      //通过tableFieldsMap 拿到这个表的所有字段
      val tableFields = tableFieldsMap.get(table)
      //对这个字段进行遍历
      val keyIterator = tableFields.keySet().iterator()
      //创建ArrayBuffer
      var arrayStructType = ArrayBuffer[StructField]()
      while (keyIterator.hasNext) {
        val key = keyIterator.next()
        val value = tableFields.get(key)

        //将key拆分 获取 "."后面的部分作为数据字段
        val field = key.split("\\.")(1)
        value match {
          /* case "string" => arrayStructType += StructField(field, StringType, true)
           case "long"   => arrayStructType += StructField(field, LongType, true)
           case "double"   => arrayStructType += StructField(field, DoubleType, true)*/
          case "string" => arrayStructType += StructField(field, StringType, true)
          case "long" => arrayStructType += StructField(field, StringType, true)
          case "double" => arrayStructType += StructField(field, StringType, true)
          case _ => println("Nothing Matched!!" + value)
        }
      }
      val schema = StructType(arrayStructType)
      mapStructType.put(table.toString, schema)
    }
    mapStructType
  }
}

3、kafka写hdfs和创建hive表

Kafka2HiveTest.scala

package com.hsiehchou.spark.streaming.kafka.kafka2hdfs

import java.util

import com.hsiehchou.hdfs.HdfsAdmin
import com.hsiehchou.hive.HiveConf
import com.hsiehchou.spark.common.{SparkContextFactory}
import com.hsiehchou.spark.streaming.kafka.Spark_Kafka_ConfigUtil
import com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming.kafkaConfig
import org.apache.hadoop.fs.Path
import org.apache.spark.{Logging}
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.{DataFrame, Row, SaveMode}
import org.apache.spark.sql.types.StructType
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.kafka.KafkaManager

import scala.collection.JavaConversions._

object Kafka2HiveTest extends Serializable with Logging{

  val topics = "chl_test7".split(",")

  //获取所有数据类型
  //获取所有数据的Schema
  def main(args: Array[String]): Unit = {
    //val ssc = SparkContextFactory.newSparkLocalStreamingContext("XZ_kafka2es", java.lang.Long.valueOf(10),1)

    val ssc = SparkContextFactory.newSparkStreamingContext("Kafka2HiveTest", java.lang.Long.valueOf(10))

    //1.创建HIVE表  hiveSQL已經創建好了
    val sc = ssc.sparkContext
    val hiveContext: HiveContext = HiveConf.getHiveContext(sc)
    hiveContext.setConf("spark.sql.parquet.mergeSchema", "true")
    createHiveTable(hiveContext)

    //kafka拿到流数据
    val kafkaDS = new KafkaManager(Spark_Kafka_ConfigUtil
                                    .getKafkaParam(kafkaConfig.getProperty("metadata.broker.list"),
                                      "Kafka2HiveTest"))
                                    .createJsonToJMapStringDirectStreamWithOffset(ssc, topics.toSet)
                                    .persist(StorageLevel.MEMORY_AND_DISK)

    HiveConfig.tables.foreach(table=>{
      //过滤出单一数据类型(获取和table相同类型的所有数据)
       val tableDS = kafkaDS.filter(x => {table.equals(x.get("table"))})

      //获取数据类型的schema 表结构
      val schema = HiveConfig.mapSchema.get(table)

      //获取这个表的所有字段
      val schemaFields: Array[String] = schema.fieldNames
      tableDS.foreachRDD(rdd=>{

        //TODO 数据写入HDFS
        /* val sc = rdd.sparkContext
        val hiveContext = HiveConf.getHiveContext(sc)
        hiveContext.sql(s"USE DEFAULT")*/

        //将RDD转为DF   原因：要加字段描述，写比较方便
        val tableDF = rdd2DF(rdd,schemaFields,hiveContext,schema)

        //多种数据一起处理
        val path_all = s"hdfs://hadoop1:8020${HiveConfig.hive_root_path}${table}"
        val exists = HdfsAdmin.get().getFs.exists(new Path(path_all))

        //2.写到HDFS   不管存不存在我们都要把数据写入进去 通过追加的方式
        //每10秒写一次，写一次会生成一个文件
        tableDF.write.mode(SaveMode.Append).parquet(path_all)

        //3.加载数据到HIVE
        if (!exists) {
          //如果不存在 进行首次加载
          System.out.println("===================开始加载数据到分区=============")
          hiveContext.sql(s"ALTER TABLE ${table} LOCATION '${path_all}'")
        }
      })
    })
    ssc.start()
    ssc.awaitTermination()
  }

  /**
    * 创建HIVE表
    * @param hiveContext
    */
  def createHiveTable(hiveContext: HiveContext): Unit ={
    val keys = HiveConfig.hiveTableSQL.keySet()
    keys.foreach(key=>{
      val sql = HiveConfig.hiveTableSQL.get(key)
      //通过hiveContext 和已经创建好的SQL语句去创建HIVE表
      hiveContext.sql(sql)
      println(s"创建表${key}成功")
    })
  }

  /**
    * 将RDD转为DF
    * @param rdd
    * @param schemaFields
    * @param hiveContext
    * @param schema
    * @return
    */
  def rdd2DF(rdd:RDD[util.Map[String,String]],
             schemaFields: Array[String],
             hiveContext:HiveContext,
             schema:StructType): DataFrame ={

      //将RDD[Map[String,String]]转为RDD[ROW]
      val rddRow = rdd.map(recourd => {
        val listRow: util.ArrayList[Object] = new util.ArrayList[Object]()
          for (schemaField <- schemaFields) {
            listRow.add(recourd.get(schemaField))
          }
          Row.fromSeq(listRow)
          //所有分区合并成一个
      }).repartition(1)
    //构建DF
    //def createDataFrame(rowRDD: RDD[Row], schema: StructType)
    val typeDF = hiveContext.createDataFrame(rddRow, schema)
    typeDF
  }
}

4、Kafka2HiveTest 执行

spark-submit --master local[1] --num-executors 1 --driver-memory 300m --executor-memory 500m --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ‘ ‘ ‘,’) --class com.hsiehchou.spark.streaming.kafka.kafka2hdfs.Kafka2HiveTest /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar

存到hdfs中

hive查询1

5、xz_bigdata_spark/src/java/

com/hsiehchou/hdfs
HdfsAdmin.java—HDFS 文件操作类

package com.hsiehchou.hdfs;

import com.hsiehchou.common.adjuster.StringAdjuster;
import com.hsiehchou.common.file.FileCommon;
import com.google.common.base.Preconditions;
import com.google.common.collect.Lists;
import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.apache.log4j.Logger;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.lang.reflect.Array;
import java.util.Collection;
import java.util.List;

/**
 * HDFS 文件操作类
 */
public class HdfsAdmin {

    private static Logger LOG;
    private static final String HDFS_SITE = "/hadoop/hdfs-site.xml";
    private static final String CORE_SITE = "/hadoop/core-site.xml";

    private volatile static HdfsAdmin hdfsAdmin;

    private  FileSystem fs;

    private HdfsAdmin(Configuration conf, Logger logger){
        try {
            if(conf == null) conf = newConf();
            conf.set("fs.defaultFS","hdfs://hadoop1:8020");
            fs = FileSystem.get(conf);
        } catch (IOException e) {
            LOG.error("获取 hdfs的FileSystem出现异常。", e);
        }
        Preconditions.checkNotNull(fs, "没有获取到可用的Hdfs的FileSystem");
        this.LOG = logger;
        if(this.LOG == null)
            this.LOG = Logger.getLogger(HdfsAdmin.class);
    }

    private Configuration newConf(){

        Configuration conf = new Configuration();
        if(FileCommon.exist(HDFS_SITE)) conf.addResource(HDFS_SITE);
        if(FileCommon.exist(CORE_SITE)) conf.addResource(CORE_SITE);
        return conf;
    }

    public static HdfsAdmin get(){
        return get(null);
    }

    /**
     * 获取hdfsAdmin
     * @param logger
     * @return
     */
    public static HdfsAdmin get(Logger logger){
        if(hdfsAdmin == null){
            synchronized (HdfsAdmin.class){
                if(hdfsAdmin == null) hdfsAdmin = new HdfsAdmin(null, logger);
            }
        }
        return hdfsAdmin;
    }

    public static HdfsAdmin get(Configuration conf, Logger logger){
        if(hdfsAdmin == null){
            synchronized (HdfsAdmin.class){
                if(hdfsAdmin == null) hdfsAdmin = new HdfsAdmin(conf, logger);
            }
        }
        return hdfsAdmin;
    }

    public FileStatus getFileStatus(String dir) {
        FileStatus fileStatus = null;
        try {
            fileStatus = fs.getFileStatus(new Path(dir));
        } catch (IOException e) {
            LOG.error(String.format("获取文件 %s信息失败。", dir), e);
        }
        return fileStatus;
    }

    public void createFile(String dst , byte[] contents){
        //目标路径
        Path dstPath = new Path(dst);

        //打开一个输出流
        FSDataOutputStream outputStream;
        try {
            outputStream = fs.create(dstPath);
            outputStream.write(contents);
            outputStream.flush();
            outputStream.close();
        } catch (IOException e) {
            LOG.error(String.format("创建文件 %s 失败。", dst), e);
        }
        LOG.info(String.format("文件: %s 创建成功！", dst));
    }

    //上传本地文件
    public void uploadFile(String src,String dst){
        //原路径
        Path srcPath = new Path(src);

        //目标路径
        Path dstPath = new Path(dst);

        //调用文件系统的文件复制函数,前面参数是指是否删除原文件，true为删除，默认为false
        try {
            fs.copyFromLocalFile(false,srcPath, dstPath);
        } catch (IOException e) {
            LOG.error(String.format("上传文件 %s 到 %s 失败。", src, dst), e);
        }
        //打印文件路径
        LOG.info(String.format("上传文件 %s 到 %s 完成。", src, dst));
    }

    public void downloadFile(String src , String dst){
        Path dstPath = new Path(dst) ;
        try {
            fs.copyToLocalFile(false, new Path(src), dstPath);
        } catch (IOException e) {
            LOG.error(String.format("下载文件 %s 到 %s 失败。", src, dst), e);
        }
        LOG.info(String.format("下载文件 %s 到 %s 完成", src, dst));
    }

    //文件重命名
    public void rename(String oldName,String newName){

        Path oldPath = new Path(oldName);
        Path newPath = new Path(newName);
        boolean isok = false;
        try {
            isok = fs.rename(oldPath, newPath);
        } catch (IOException e) {
            LOG.error(String.format("重命名文件 %s 为 %s 失败。", oldName, newName), e);
        }
        if(isok){
            LOG.info(String.format("重命名文件 %s 为 %s 完成。", oldName, newName));
        }else{
            LOG.error(String.format("重命名文件 %s 为 %s 失败。", oldName, newName));
        }
    }

    public void delete(String path){
        delete(path, true);
    }

    //删除文件
    public void delete(String path, boolean recursive){

        Path deletePath = new Path(path);
        boolean isok = false;
        try {
            isok = fs.delete(deletePath, recursive);
        } catch (IOException e) {
            LOG.error(String.format("删除文件 %s 失败。", path), e);
        }
        if(isok){
            LOG.info(String.format("删除文件 %s 完成。", path));
        }else{
            LOG.error(String.format("删除文件 %s 失败。", path));
        }
    }

    //创建目录
    public void mkdir(String path){

        Path srcPath = new Path(path);
        boolean isok = false;
        try {
            isok = fs.mkdirs(srcPath);
        } catch (IOException e) {
            LOG.error(String.format("创建目录 %s 失败。", path), e);
        }
        if(isok){
            LOG.info(String.format("创建目录 %s 完成。", path));
        }else{
            LOG.error(String.format("创建目录 %s 失败。", path));
        }
    }

    //读取文件的内容
    public InputStream readFile(String filePath){
        Path srcPath = new Path(filePath);
        InputStream in = null;
        try {
           in = fs.open(srcPath);
        } catch (IOException e) {
            LOG.error(String.format("读取文件  %s 失败。", filePath), e);
        }
        return in;
    }

    public <T> void readFile(String filePath, StringAdjuster<T> adjuster, Collection<T> result){
        InputStream inputStream = readFile(filePath);
        if(inputStream != null){
            InputStreamReader reader = new InputStreamReader(inputStream);
            BufferedReader bufferedReader = new BufferedReader(reader);
            String line;
            try {
                T t;
                while((line = bufferedReader.readLine()) != null){
                    t = adjuster.doAdjust(line);
                    if(t != null)result.add(t);
                }
            } catch (IOException e) {
                LOG.error(String.format("利用缓冲流读取文件  %s 失败。", filePath), e);
            }finally {
                IOUtils.closeQuietly(bufferedReader);
                IOUtils.closeQuietly(reader);
                IOUtils.closeQuietly(inputStream);
            }
        }
    }

    public List<String> readLines(String filePath){
        return readLines(filePath, "UTF-8");
    }

    public  List<String> readLines(String filePath, String encoding){
        InputStream inputStream = readFile(filePath);
        List<String> lines = null;
        if(inputStream != null) {
            try {
                lines = IOUtils.readLines(inputStream, encoding);
            } catch (IOException e) {
                LOG.error(String.format("按行读取文件 %s 失败。", filePath), e);
            }finally {
                IOUtils.closeQuietly(inputStream);
            }
        }
        return lines;
    }

    public List<FileStatus> findNewFileOrDirInDir(String dir, HdfsFileFilter filter,
                                                final boolean onlyFile, final boolean onlyDir){
       return findNewFileOrDirInDir(dir, filter, onlyFile, onlyDir, false);
    }

    public List<FileStatus> findNewFileOrDirInDir(String dir, HdfsFileFilter filter,
                          final boolean onlyFile, final boolean onlyDir, boolean recursive){
        if(onlyFile && onlyDir){
            FileStatus fileStatus = getFileStatus(dir);
            if(fileStatus == null)return Lists.newArrayList();
            if(isAccepted(fileStatus,filter)){
                return Lists.newArrayList(fileStatus);
            }
            return Lists.newArrayList();
        }

       if(onlyFile){
           return findNewFileInDir(dir, filter, recursive);
       }

       if(onlyDir){
           return findNewDirInDir(dir, filter, recursive);
       }
       return Lists.newArrayList();
    }

    /**
     * 查找一个文件夹中 新建的目录
     * @param dir
     * @param filter
     * @return
     */
    public List<FileStatus> findNewDirInDir(String dir, HdfsFileFilter filter){
        return findNewDirInDir(new Path(dir), filter, false);
    }
    public List<FileStatus> findNewDirInDir(Path path, HdfsFileFilter filter){
        return findNewDirInDir(path, filter, false);
    }

    public List<FileStatus> findNewDirInDir(String dir, HdfsFileFilter filter, boolean recursive){
        return findNewDirInDir(new Path(dir), filter, recursive);
    }

    public List<FileStatus> findNewDirInDir(Path path, HdfsFileFilter filter, boolean recursive){
        FileStatus[] files = null;
        try {
            files = fs.listStatus(path);
        } catch (IOException e) {
            LOG.error(String.format("获取目录 %s下的文件列表失败。", path), e);
        }
        if(files == null)return Lists.newArrayList();

        List<FileStatus> paths = Lists.newArrayList();
        List<String> res = Lists.newArrayList();
        for(FileStatus fileStatus : files){
            if (fileStatus.isDirectory()) {
                if (isAccepted(fileStatus, filter)) {
                    paths.add(fileStatus);
                    res.add(fileStatus.getPath().toString());
                }else if(recursive){
                    paths.addAll(findNewDirInDir(fileStatus.getPath(), filter, recursive));
                }
            }
        }
        LOG.info(String.format("从目录%s 找到满足条件%s 有如下 %s 个文件： %s",
                path, filter,res.size(), res));
        return paths;
    }

    /**
     * 查找一个文件夹中 新建的文件
     * @param dir
     * @param filter
     * @return
     */
    public List<FileStatus> findNewFileInDir(String dir, HdfsFileFilter filter){
        return  findNewFileInDir(new Path(dir), filter, false);
    }

    public List<FileStatus> findNewFileInDir(String dir, HdfsFileFilter filter, boolean recursive){
        return  findNewFileInDir(new Path(dir), filter, recursive);
    }

    public List<FileStatus> findNewFileInDir(Path path, HdfsFileFilter filter){
        return  findNewFileInDir(path, filter, false);
    }

    public List<FileStatus> findNewFileInDir(Path path, HdfsFileFilter filter, boolean recursive){

        FileStatus[] files = null;
        try {
            files = fs.listStatus(path);
        } catch (IOException e) {
            LOG.error(String.format("获取目录 %s下的文件列表失败。", path), e);
        }
        if(files == null)return Lists.newArrayList();

        List<FileStatus> paths = Lists.newArrayList();
        List<String> res = Lists.newArrayList();
        for(FileStatus fileStatus : files){
            if (fileStatus.isFile()) {
                if (isAccepted(fileStatus, filter)) {
                    paths.add(fileStatus);
                    res.add(fileStatus.getPath().toString());
                }
            }else if(recursive){
                paths.addAll(findNewFileInDir(fileStatus.getPath(), filter, recursive));
            }
        }
        LOG.info(String.format("从目录%s 找到满足条件%s 有如下 %s 个文件： %s", path, filter,res.size(), res));

        return paths;
    }

    private boolean isAccepted(String file, HdfsFileFilter filter) {
        if(filter == null) return true;
        FileStatus fileStatus = getFileStatus(file);
        if(fileStatus == null)return false;
        return isAccepted(fileStatus, filter);
    }

    private boolean isAccepted(FileStatus fileStatus, HdfsFileFilter filter) {
        return  filter == null ? true : filter.filter(fileStatus);
    }

    public long getModificationTime(Path path){
        try {
            FileStatus status = fs.getFileStatus(path);
            return status.getModificationTime();
        } catch (IOException e) {
            LOG.error(String.format("获取路径 %s信息失败。", path), e);
        }
        return -1L;
    }

    public FileSystem getFs() {
        return fs;
    }

    public static void main(String[] args) throws Exception {
        // HdfsAdmin hdfsAdmin = HdfsAdmin.get();
       // hdfsAdmin.mkdir("hdfs://hdp04.ultiwill.com:8020/test1111");
        //System.out.println(hdfsAdmin.getFs().exists(new Path("hdfs://hdp04.ultiwill.com:8020/test")));
        //hdfsAdmin.delete("hdfs://hdp04.ultiwill.com:8020/test1111");
        //System.out.println("hdfsAdmin = " + );
       // List<FileStatus> status = hdfsAdmin.findNewDirInDir("hdfs://hdp04.ultiwill.com:50070/hdp", null);
        //System.out.println("status = " + status.size());
    }
}

HdfsFileFilter.java

package com.hsiehchou.hdfs;

import com.hsiehchou.common.filter.Filter;
import org.apache.hadoop.fs.FileStatus;

public abstract class HdfsFileFilter implements Filter<FileStatus> {

}

com/hsiehchou/hive
HiveConf.java

package com.hsiehchou.hive;

import org.apache.hadoop.conf.Configuration;
import org.apache.spark.SparkContext;
import org.apache.spark.sql.hive.HiveContext;

import java.util.Iterator;
import java.util.Map;

public class HiveConf {

    //private static String DEFUALT_CONFIG = "spark/hive/hive-server-config";
    private static HiveConf hiveConf;
    private static HiveContext hiveContext;

    private HiveConf(){

    }

    public static HiveConf getHiveConf(){
        if(hiveConf==null){
            synchronized (HiveConf.class){
                if(hiveConf==null){
                    hiveConf=new  HiveConf();
                }
            }
        }
        return hiveConf;
    }

    public static HiveContext getHiveContext(SparkContext sparkContext){
        if(hiveContext==null){
            synchronized (HiveConf.class){
                if(hiveContext==null){
                    hiveContext = new  HiveContext(sparkContext);
                    Configuration conf = new Configuration();
                    conf.addResource("spark/hive/hive-site.xml");
                    Iterator<Map.Entry<String, String>> iterator = conf.iterator();
                    while (iterator.hasNext()) {
                        Map.Entry<String, String> next = iterator.next();
                        hiveContext.setConf(next.getKey(), next.getValue());
                    }
                    hiveContext.setConf("spark.sql.parquet.mergeSchema", "true");
                }
            }
        }
        return hiveContext;
    }
}

6、小文件合并

scala/com/hsiehchou/spark/streaming/kafka/kafka2hdfs

CombineHdfs.scala—合并HDFS小文件任务

package com.hsiehchou.spark.streaming.kafka.kafka2hdfs

import com.hsiehchou.hdfs.HdfsAdmin
import com.hsiehchou.spark.common.SparkContextFactory
import org.apache.hadoop.fs.{FileSystem, FileUtil, Path}
import org.apache.spark.Logging
import org.apache.spark.sql.{SQLContext, SaveMode}

import scala.collection.JavaConversions._

/**
  * 合并HDFS小文件任务
  */
object CombineHdfs extends Serializable with Logging{

  def main(args: Array[String]): Unit = {
    //  val sparkContext = SparkContextFactory.newSparkBatchContext("CombineHdfs")

    val sparkContext = SparkContextFactory.newSparkLocalBatchContext("CombineHdfs")

    //创建一个 sparkSQL
    val sqlContext: SQLContext = new SQLContext(sparkContext)

    //遍历表 就是遍历HIVE表
    HiveConfig.tables.foreach(table=>{

      //获取HDFS文件目录
      //apps/hive/warehouse/external/mail类似
      //apps/hive/warehouse/external/mail
      val table_path =s"${HiveConfig.hive_root_path}$table" 

      //通过sparkSQL 加载 这些目录的文件
      val tableDF = sqlContext.read.load(table_path)

      //先获取原来数据种的所有文件  HDFS文件 API
      val fileSystem:FileSystem = HdfsAdmin.get().getFs

      //通过globStatus 获取目录下的正则匹配文件
      //fileSystem.listFiles()
      val arrayFileStatus = fileSystem.globStatus(new Path(table_path+"/part*"))

      //stat2Paths将文件状态转为文件路径   这个文件路径是用来删除的
      val paths = FileUtil.stat2Paths(arrayFileStatus)

      //写入合并文件   //repartition 需要根据生产中实际情况去定义
      tableDF.repartition(1).write.mode(SaveMode.Append).parquet(table_path)
      println("写入" + table_path +"成功")

      //删除小文件
      paths.foreach(path =>{
        HdfsAdmin.get().getFs.delete(path)
        println("删除文件" + path + "成功")
      })
    })
  }
}

7、定时任务

命令行输入：crontab -e

内容：
0 1 * * * spark-submit --master local[1] --num-executors 1 --driver-memory 300m --executor-memory 500m --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ‘ ‘ ‘,’) --class com.hsiehchou.spark.streaming.kafka.kafka2hdfs.CombineHdfs /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar

说明：
* * * * * 执行的任务

项目	含义	范围
第一个“*”	一小时当中的第几分钟（分）	0-59
第二个“*”	一天当中的第几小时（时）	0-23
第三个“*”	一个月当中的第几天（天）	1-31
第四个“*”	一年当中的第几月（月）	1-12
第五个“*”	一周当中的星期几（周）	0-7（0和7都代表星期日）

8、合并小文件截图

合并小文件

9、hive命令

show tales;

hdfs dfs -ls /apps/hive/warehouse/external

hdfs dfs -rm -r /apps/hive/warehouse/external/mail

drop table mail;

desc qq;

select * from qq limit 1;
select count(*) from qq;

/usr/bin下面的启动zookeeper客户端
zookeeper-client

删除zookeeper里面的消费者数据
rmr /consumers/WarningStreamingTask2/offsets

rmr /consumers/Kafka2HiveTest/offsets

rmr /consumers/DataRelationStreaming1/offsets

十一、Spark—Kafka2Hbase

1、数据关联

（1）为什么需要关联
问题：我们不能充分了解数据之间的关联关系。

公司中应用的非常多
离线关联，传通数据 mysql 通过关联字段去关联。
但是，如果数据量非常大，关联表非常多。处理不了。

数据零散，只能从单一维度去看数据，看的面比较窄。
如果需要从多个维度分析，关联成本比较大。

建立数据之间的关联关系，实现关联查询的毫秒级响应；
另一个方面，可以为数据挖掘，机器学习提供训练数据。

后面进行机器学习的时候，都需要从多维度对数据进行分析和建模。

（2）HBASE 只要rowkey一样，那么他们就是一条数据
QQ
aa-aa-aa-aa-aa-aa 666666

微信
aa-aa-aa-aa-aa-aa weixin

邮箱
aa-aa-aa-aa-aa-aa 666666@qq.com

（3）如何关联
一对一的情况 :
https://blog.csdn.net/shujuelin/article/details/83657485

使用HBASE写入特性
比如 MAC1 1789932321
MAC1 88888@qq.com
MAC1 88888

一对多的情况怎么处理
使用多版本
aa-aa-aa-aa-aa-aa 666666
aa-aa-aa-aa-aa-aa 777777

（4）一对多
使用多版本存一堆多的关系
多版本插入了一个777777 一个版本
再插入一个777777 一个版本

所以需要自定义版本号确定版本唯一
通过 “888888”.hashCode() & Integer.MAX_VALUE

（5）如果实现hbase多字段查询
往主关联表 test:relation 里面写入数据 rowkey=>aa-aa-aa-aa-aa-aa version=>1637094383 类型phone_mac value=>aa-aa-aa-aa-aa-aa
往二级索表 test:phone_mac里面写入数据 rowkey=>aa-aa-aa-aa-aa-aa version=>1736188717 value=>aa-aa-aa-aa-aa-aa

Hbase关联

查询不直接查主关联表，因为查询字段不在主键里面，没办法查或者性能非常低下。

查询是分2步rowkey查询
第一步，通过查询字段取对应的二级索引表里面去找主关联表的ROWKEY
第二步，通过主关联表的ROWKEY 获取HBASE中的全量数据

WIFI 已经入库的情况下，手机号也必须已经入库了，才能找到
加入WIFI的手机号还没有入库

如果是基础数据先过来没有mac 没有主键

Card	phone
400000000000000	18612345678

关联

Phone	value （识别这个字段是身份证才可以）
18612345678	400000000000000

1）因为检索的时候都是通过索引表直接找MAC，混入了身份证
2）要进行一个合并

（6）关联及二级索引示意

关联及二级索引示意

Hbase关联表示意图

（7）如果使用ES建立二级索引

使用ES建立二级索引

如果hbase 里面有100个字段，存放的是全量信息，但是只有20个字段参与查询、检索，那么我们可以把这个20个字段单独提出来存放到es中，因为ES是对对字段，多条件查询非常灵活。所以我们可以先在ES中对条件进行检索，根据检索的结果拿到hbaSe的rowkey，然后再通过rowkey到hbase里面获取全量信息。

（8）Hbase 预分区
主要是根据rowkey分布来进行预分区

分区主要是为了防止热点问题

relation表为例
这个表的rowkey 是不是就是 mac

phone_mac 都是以0-9 a-f开头的
device_mac 都是以0-9 a-z开头的
Hbase 是按字典序排序

（9）自定义版本号
通过这样的一个转换我们可以精确定位数据的多版本号，，然后可以根据版本号对数据进行多版本删除。
156511 aaaaaaaa

2、DataRelationStreaming—数据关联

DataRelationStreaming.scala

package com.hsiehchou.spark.streaming.kafka.kafka2hbase

import java.util.Properties

import com.hsiehchou.common.config.ConfigUtil
import com.hsiehchou.hbase.config.HBaseTableUtil
import com.hsiehchou.hbase.insert.HBaseInsertHelper
import com.hsiehchou.hbase.spilt.SpiltRegionUtil
import com.hsiehchou.spark.common.SparkContextFactory
import com.hsiehchou.spark.streaming.kafka.Spark_Kafka_ConfigUtil
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.hbase.util.Bytes
import org.apache.spark.Logging
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.kafka.KafkaManager

object DataRelationStreaming extends Serializable with Logging{

  // 读取需要关联的配置文件字段
  // phone_mac,phone,username,send_mail,imei,imsi
  val relationFields = ConfigUtil.getInstance()
    .getProperties("spark/relation.properties")
    .get("relationfield")
    .toString
    .split(",")
  def main(args: Array[String]): Unit = {

    //初始化hbase表
    //initRelationHbaseTable(relationFields)

    val ssc = SparkContextFactory.newSparkLocalStreamingContext("DataRelationStreaming", java.lang.Long.valueOf(10),1)
    //  val ssc = SparkContextFactory.newSparkStreamingContext("DataRelationStreaming", java.lang.Long.valueOf(10))

    val kafkaConfig: Properties = ConfigUtil.getInstance().getProperties("kafka/kafka-server-config.properties")
    val topics = "chl_test7".split(",")
    val kafkaDS = new KafkaManager(Spark_Kafka_ConfigUtil
      .getKafkaParam(kafkaConfig.getProperty("metadata.broker.list"),
        "DataRelationStreaming2"))
      .createJsonToJMapStringDirectStreamWithOffset(ssc, topics.toSet)
      .persist(StorageLevel.MEMORY_AND_DISK)

    kafkaDS.foreachRDD(rdd=>{

      rdd.foreachPartition(partion=>{
        //对partion进行遍历
        while (partion.hasNext){

          //获取每一条流数据
          val map = partion.next()

          //获取mac 主键
          var phone_mac:String = map.get("phone_mac")

          //获取所有关联字段 //phone_mac,phone,username,send_mail,imei,imsi
          relationFields.foreach(relationFeild =>{
            //relationFields 是关联字段，需要进行关联处理的，所有判断
            //map中是不是包含这个字段，如果包含的话，取出来进行处理
            if(map.containsKey(relationFeild)){
              //创建主关联，并遍历关联字段进行关联
              val put = new Put(phone_mac.getBytes())

              //取关联字段的值
              //TODO  到这里  主关联表的 主键和值都有了  然后封装成PUT写入hbase主关联表就行了
              val value = map.get(relationFeild)

              //自定义版本号  通过 (表字段名 + 字段值 取hashCOde)
              //因为值有可能是字符串，但是版本号必须是long类型，所以这里我们需要
              //将字符串影射唯一数字，而且必须是正整数
              val versionNum = (relationFeild+value).hashCode() & Integer.MAX_VALUE
              put.addColumn("cf".getBytes(), Bytes.toBytes(relationFeild),versionNum ,Bytes.toBytes(value.toString))
              HBaseInsertHelper.put("test:relation",put)
              println(s"往主关联表 test:relation 里面写入数据  rowkey=>${phone_mac} version=>${versionNum} 类型${relationFeild} value=>${value}")

              // 建立二级索引
              // 使用关联字段的值最为二级索引的rowkey
              // 二级索引就是把这个字段的值作为索引表rowkey
              // 把这个字段的mac做为索引表的值
              val put_2 = new Put(value.getBytes())//把这个字段的值作为索引表rowkey
              val table_name = s"test:${relationFeild}"//往索引表里面取写
              //使用主表的rowkey  就是 取hash作为二级索引的版本号
              val versionNum_2 = phone_mac.hashCode() & Integer.MAX_VALUE
              put_2.addColumn("cf".getBytes(), Bytes.toBytes("phone_mac"),versionNum_2 ,Bytes.toBytes(phone_mac.toString))
              HBaseInsertHelper.put(table_name,put_2)
              println(s"往二级索表 ${table_name}里面写入数据  rowkey=>${value} version=>${versionNum_2} value=>${phone_mac}")
            }
          })
        }
      })
    })
    ssc.start()
    ssc.awaitTermination()
  }

  def initRelationHbaseTable(relationFields:Array[String]): Unit ={
    //初始化总关联表
    val relation_table = "test:relation"
    HBaseTableUtil.createTable(relation_table,
      "cf",
      true,
      -1,
      100,
      SpiltRegionUtil.getSplitKeysBydinct)
    //HBaseTableUtil.deleteTable(relation_table)

    //遍历所有关联字段，根据字段创建二级索引表
    relationFields.foreach(field=>{
      val hbase_table = s"test:${field}"
      HBaseTableUtil.createTable(hbase_table, "cf", true, -1, 100, SpiltRegionUtil.getSplitKeysBydinct)
      // HBaseTableUtil.deleteTable(hbase_table)
    })
  }
}

3、com.hsiehchou.spark.streaming

common/SparkContextFactory.scala

package com.hsiehchou.spark.common

import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{Accumulator, SparkContext}

object SparkContextFactory {

  def newSparkBatchContext(appName:String = "sparkBatch") : SparkContext = {
    val sparkConf = SparkConfFactory.newSparkBatchConf(appName)
    new SparkContext(sparkConf)
  }

  def newSparkLocalBatchContext(appName:String = "sparkLocalBatch" , threads : Int = 2) : SparkContext = {
    val sparkConf = SparkConfFactory.newSparkLoalConf(appName, threads)
    sparkConf.set("","")
    new SparkContext(sparkConf)
  }

  def getAccumulator(appName:String = "sparkBatch") : Accumulator[Int] = {
    val sparkConf = SparkConfFactory.newSparkBatchConf(appName)
    val accumulator: Accumulator[Int] = new SparkContext(sparkConf).accumulator(0,"")
    accumulator
  }

  /**
    * 创建本地流streamingContext
    * @param appName             appName
    * @param batchInterval      多少秒读取一次
    * @param threads            开启多少个线程
    * @return
    */
  def newSparkLocalStreamingContext(appName:String = "sparkStreaming" ,
                                    batchInterval:Long = 30L ,
                                    threads : Int = 4) : StreamingContext = {
    val sparkConf =  SparkConfFactory.newSparkLocalConf(appName, threads)
    // sparkConf.set("spark.streaming.receiver.maxRate","10000")
    sparkConf.set("spark.streaming.kafka.maxRatePerPartition","1")
    new StreamingContext(sparkConf, Seconds(batchInterval))
  }

  /**
    * 创建集群模式streamingContext
    * 这里不设置线程数，在submit中指定
    * @param appName
    * @param batchInterval
    * @return
    */
  def newSparkStreamingContext(appName:String = "sparkStreaming" , batchInterval:Long = 30L) : StreamingContext = {
    val sparkConf = SparkConfFactory.newSparkStreamingConf(appName)
    new StreamingContext(sparkConf, Seconds(batchInterval))
  }

  def startSparkStreaming(ssc:StreamingContext){
    ssc.start()
      ssc.awaitTermination()
      ssc.stop()
  }
}

streaming/kafka/Spark_Kafka_ConfigUtil.scala

package com.hsiehchou.spark.streaming.kafka

import org.apache.spark.Logging

object Spark_Kafka_ConfigUtil extends Serializable with Logging{

  def getKafkaParam(brokerList:String,groupId : String): Map[String,String]={
    val kafkaParam=Map[String,String](
      "metadata.broker.list" -> brokerList,
      "auto.offset.reset" -> "smallest",
      "group.id" -> groupId,
      "refresh.leader.backoff.ms" -> "1000",
      "num.consumer.fetchers" -> "8")
    kafkaParam
  }
}

4、com/hsiehchou/common/config/ConfigUtil

ConfigUtil.java

package com.hsiehchou.common.config;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;

public class ConfigUtil {

    private static Logger LOG = LoggerFactory.getLogger(ConfigUtil.class);

    private static ConfigUtil configUtil;

    public static ConfigUtil getInstance(){

        if(configUtil == null){
            configUtil = new ConfigUtil();
        }
        return configUtil;
    }

    public Properties getProperties(String path){
        Properties properties = new Properties();
        try {
            LOG.info("开始加载配置文件" + path);
            InputStream insss = this.getClass().getClassLoader().getResourceAsStream(path);
            properties = new Properties();
            properties.load(insss);
        } catch (IOException e) {
            LOG.info("加载配置文件" + path + "失败");
            LOG.error(null,e);
        }

        LOG.info("加载配置文件" + path + "成功");
        System.out.println("文件内容："+properties);
        return properties;
    }

    public static void main(String[] args) {
        ConfigUtil instance = ConfigUtil.getInstance();
        Properties properties = instance.getProperties("common/datatype.properties");
        //Properties properties = instance.getProperties("spark/relation.properties");

       // properties.get("relationfield");
        System.out.println(properties);
    }
}

5、构建模块—xz_bigdata_hbase

pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata2</artifactId>
        <groupId>com.hsiehchou</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_hbase</artifactId>

    <name>xz_bigdata_hbase</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <hbase.version>1.2.0</hbase.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_resources</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>${hbase.version}-${cdh.version}</version>
            <exclusions>
                <exclusion>
                    <artifactId>guava</artifactId>
                    <groupId>com.google.guava</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>zookeeper</artifactId>
                    <groupId>org.apache.zookeeper</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-server</artifactId>
            <version>${hbase.version}-${cdh.version}</version>
            <exclusions>
                <exclusion>
                    <artifactId>servlet-api-2.5</artifactId>
                    <groupId>org.mortbay.jetty</groupId>
                </exclusion>
            </exclusions>
        </dependency>
    </dependencies>

</project>

com/hsiehchou/hbase/config/HBaseConf.java

package com.hsiehchou.hbase.config;

import com.hsiehchou.hbase.spilt.SpiltRegionUtil;
import org.apache.commons.configuration.CompositeConfiguration;
import org.apache.commons.configuration.ConfigurationException;
import org.apache.commons.configuration.PropertiesConfiguration;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.BufferedMutator;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.log4j.Logger;

import java.io.IOException;
import java.io.Serializable;

public class HBaseConf implements Serializable {

    private static final long serialVersionUID = 1L;
    private static final Logger LOG = Logger.getLogger(HBaseConf.class);

    private static final String HBASE_SERVER_CONFIG = "hbase/hbase-server-config.properties";
    private static final String HBASE_SITE = "hbase/hbase-site.xml";

    private volatile static HBaseConf hbaseConf;
    private CompositeConfiguration hbase_server_config;

    public CompositeConfiguration getHbase_server_config() {

        return hbase_server_config;
    }

    public void setHbase_server_config(CompositeConfiguration hbase_server_config) {
        this.hbase_server_config = hbase_server_config;
    }

    //hbase 配置文件
    private  Configuration configuration;
    //hbase 连接
    private volatile transient Connection conn;

    /**
     * 初始化HBaseConf的时候加载配置文件
     */
    private HBaseConf() {
        hbase_server_config = new CompositeConfiguration();
        //加载配置文件
        loadConfig(HBASE_SERVER_CONFIG,hbase_server_config);
        //初始化连接
        getHconnection();
    }

    //获取连接
    public Configuration getConfiguration(){
        if(configuration==null){
            configuration = HBaseConfiguration.create();
            configuration.addResource(HBASE_SITE);
            LOG.info("加载配置文件" + HBASE_SITE + "成功");
        }
        return configuration;
    }

    public BufferedMutator getBufferedMutator(String tableName) throws IOException {
        return getHconnection().getBufferedMutator(TableName.valueOf(tableName));
    }

    public Connection getHconnection(){

        if(conn==null){
            //获取配置文件
            getConfiguration();
            synchronized (HBaseConf.class) {
                if (conn == null) {
                    try {
                        conn = ConnectionFactory.createConnection(configuration);
                    } catch (IOException e) {
                        LOG.error(String.format("获取hbase的连接失败  参数为： %s", toString()), e);
                    }
                }
            }
        }
        return conn;
    }

    /**
     * 加载配置文件
     * @param path
     * @param configuration
     */
    private void loadConfig(String path,CompositeConfiguration configuration) {
        try {
            LOG.info("加载配置文件 " + path);
            configuration.addConfiguration(new PropertiesConfiguration(path));
            LOG.info("加载配置文件" + path +"成功。 ");
        } catch (ConfigurationException e) {
            LOG.error("加载配置文件 " + path + "失败", e);
        }
    }

    /**
     * 单例 初始化HBaseConf
     * @return
     */
    public static HBaseConf getInstance() {
        if (hbaseConf == null) {
            synchronized (HBaseConf.class) {
                if (hbaseConf == null) {
                    hbaseConf = new HBaseConf();
                }
            }
        }
        return hbaseConf;
    }

    public static void main(String[] args) {
        String hbase_table = "test:chl_test2";
        HBaseTableUtil.createTable(hbase_table, "cf", true, -1, 1, SpiltRegionUtil.getSplitKeysBydinct());

      /*  Connection hconnection = HBaseConf.getInstance().getHconnection();
        Connection hconnection1 = HBaseConf.getInstance().getHconnection();
        System.out.println(hconnection);
        System.out.println(hconnection1);*/
    }
}

com/hsiehchou/hbase/config/HBaseTableFactory.java

package com.hsiehchou.hbase.config;

import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.BufferedMutator;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Table;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.io.Serializable;

public class HBaseTableFactory implements Serializable {

    private static final long serialVersionUID = -1071596337076137201L;

    private static final Logger LOG = LoggerFactory.getLogger(HBaseTableFactory.class);

    private HBaseConf conf;
    private transient Connection conn  ;
    private boolean isReady = true;

    public HBaseTableFactory(){

        conf = HBaseConf.getInstance();
        if(true){
            conn = conf.getHconnection();
        }else{
            isReady = false;
            LOG.warn("HBase 连接没有启动。");
        }
    }

    public HBaseTableFactory(Connection conn){
        this.conn = conn;
    }

    /**
      * 根据表名创建 表的实例
      * @param tableName
      * @return
      * @throws IOException
      * HTableInterface
     */
    public Table getHBaseTableInstance(String tableName) throws IOException{

        if(conn == null){
            if(conf == null){
                conf = HBaseConf.getInstance();
                isReady = true;
                LOG.warn("HBaseConf为空，重新初始化。");
            }
            synchronized (HBaseTableFactory.class) {
                if(conn == null) {
                    conn = conf.getHconnection();
                    LOG.warn("初始 hbase Connection 为空 ， 获取  Connection成功。");
                }
            }
        }
        return  isReady ? conn.getTable(TableName.valueOf(tableName)) : null;
    }

    public HTable getHTable(String tableName) throws IOException{

        return  (HTable) getHBaseTableInstance(tableName);
    }

    public BufferedMutator getBufferedMutator(String tableName) throws IOException {
        return getConf().getBufferedMutator(tableName);
    }

    public boolean isReady() {
        return isReady;
    }

    private HBaseConf getConf(){
        if(conf == null){
            conf = HBaseConf.getInstance();
        }
        return conf;
    }

    public void close() throws IOException{
        conn.close();
        conn = null;
    }
}

com/hsiehchou/hbase/config/HBaseTableUtil

package com.hsiehchou.hbase.config;

import com.google.common.collect.Sets;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.io.compress.Compression;
import org.apache.hadoop.hbase.io.encoding.DataBlockEncoding;
import org.apache.hadoop.hbase.regionserver.BloomType;
import org.apache.hadoop.hbase.util.Bytes;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.util.*;

import static com.google.common.base.Preconditions.checkArgument;

public class HBaseTableUtil {

    private static final Logger LOG = LoggerFactory.getLogger(HBaseTableUtil.class);
    private static final String COPROCESSORCLASSNAME =  "org.apache.hadoop.hbase.coprocessor.AggregateImplementation";
    private static HBaseConf conf = HBaseConf.getInstance() ;

    private HBaseTableUtil(){}

    /**
     * 获取hbase 表连接
     * @param tableName
     * @return
     */
    public static Table getTable(String tableName){
        Table table =null;
        if(tableExists(tableName)){
            try {
                table = conf.getHconnection().getTable(TableName.valueOf(tableName));
            } catch (IOException e) {
                LOG.error(null,e);
            }
        }
        return table;
    }

    public static void close(Table table){
        if(table != null) {
            try {
                table.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

    /**
     * 判断   HBase中是否存在  名为  tableName 的表
     * @param tableName
     * @return  boolean
     */
    public static boolean tableExists(String tableName){

        boolean  isExists = false;
        try {
            isExists = conf.getHconnection().getAdmin().tableExists(TableName.valueOf(tableName));
        } catch (MasterNotRunningException e) {
            LOG.error("HBase  master  未运行 。 ", e);
        } catch (ZooKeeperConnectionException e) {
            LOG.error("zooKeeper 连接异常。 ", e);
        } catch (IOException e) {
            LOG.error("", e);
        }
        return isExists;
    }

    /**
     * 删除表
     * @param tableName
     * @return
     */
    public static boolean deleteTable(String tableName){

        boolean status = false;
        TableName name = TableName.valueOf(tableName);
        try {
            Admin admin = conf.getHconnection().getAdmin();
            if(admin.tableExists(name)){
                if(!admin.isTableDisabled(name)){
                    admin.disableTable(name);
                }
                admin.deleteTable(name);
            }else{
                LOG.warn(" HBase中不存在 表 " + tableName);
            }
            admin.close();
            status = true;
        } catch (MasterNotRunningException e) {
            LOG.error("HBase  master  未运行 。 ", e);
        } catch (ZooKeeperConnectionException e) {
            LOG.error("zooKeeper 连接异常。 ", e);
        } catch (IOException e) {
            LOG.error("", e);
        }
        return status;
    }

    /**
     * 清空表
     * @param tableName
     * @return
     */
    public static boolean truncateTable(String tableName){

        boolean status = false;
        TableName name = TableName.valueOf(tableName);

        try {
            Admin admin = conf.getHconnection().getAdmin();
            if(admin.tableExists(name)){
                if(admin.isTableAvailable(name)){
                    admin.disableTable(name);
                }
                admin.truncateTable(name, true);
            }else{
                LOG.warn(" HBase中不存在 表 " + tableName);
            }
            admin.close();
            status = true;
        } catch (MasterNotRunningException e) {
            LOG.error("HBase  master  未运行 。 ", e);
        } catch (ZooKeeperConnectionException e) {
            LOG.error("zooKeeper 连接异常。 ", e);
        } catch (IOException e) {
            LOG.error("", e);
        }
        return status;
    }

    /**
     * 创建HBase表
     * @param tableName
     * @param cf       列族名
     * @param inMemory
     * @param ttl    ttl < 0     则为永久保存
     */
    public static boolean createTable(String tableName, String cf, boolean inMemory, int ttl, int maxVersion){

        HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, COPROCESSORCLASSNAME);

        return createTable(htd);
    }

    public static boolean createTable(String tableName, String cf, boolean inMemory, int ttl, int maxVersion,  boolean useSNAPPY){

        HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, useSNAPPY , COPROCESSORCLASSNAME);

        return createTable(htd);
    }

    public static boolean createTable(String tableName, String cf, boolean inMemory, int ttl, int maxVersion,  boolean useSNAPPY, byte[][] splits){

        HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, useSNAPPY, COPROCESSORCLASSNAME);
        return createTable(htd , splits);

    }

    /**
     * @param tableName    表名
     * @param cf           列簇
     * @param inMemory     是否存在内存
     * @param ttl          数据过期时间
     * @param maxVersion   最大版本
     * @param splits       分区
     * @return
     */
    public static boolean createTable(String tableName,
                                      String cf,
                                      boolean inMemory,
                                      int ttl,
                                      int maxVersion,
                                      byte[][] splits){
        //返回表说明
        HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, COPROCESSORCLASSNAME);
        //通过HTableDescriptor 和 splits 分区策略来定义表
        return createTable(htd , splits);
    }

    public static List<String> listTables(){

        List<String> list = new ArrayList<String>();
        Admin admin = null;

        try {
            admin = conf.getHconnection().getAdmin();
            TableName[] listTableNames = admin.listTableNames();
            for( TableName t :  listTableNames ){
                list.add( t.getNameAsString() );
            }
        } catch(IOException e )  {
            LOG.error("创建HBase表失败。", e);
        }finally{
            try {
                if(admin!=null){
                    admin.close();
                }
            } catch (IOException e) {
                LOG.error("", e);
            }
        }
        return list;
    }

    /**
     * 列出所有表
     * @param reg
     * @return
     */
    public static List<String> listTables(String reg){
        List<String> list = new ArrayList<String>();
        Admin admin = null;

        try {
            admin = conf.getHconnection().getAdmin();
            TableName[] listTableNames = admin.listTableNames(reg);
            for(TableName t :  listTableNames){
                list.add(t.getNameAsString());
            }
        } catch(IOException e)  {
            LOG.error("创建HBase表失败。", e);
        }finally{
            try {
                if(admin!=null){
                    admin.close();
                }
            } catch (IOException e) {
                LOG.error("", e);
            }
        }
        return list;
    }

    /**
     * 创建HBase表
     * @param tableName
     * @param cf       列族名
     * @param inMemory
     * @param ttl      ttl < 0     则为永久保存
     */
    public static boolean  createTable(String tableName, String cf, boolean inMemory, int ttl , int maxVersion, String ... coprocessorClassNames){
        HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, coprocessorClassNames);
        return createTable(htd);
    }

    public static boolean  createTable( String tableName, String cf, boolean inMemory, int ttl, int maxVersion, boolean useSNAPPY, String ... coprocessorClassNames){
        HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, useSNAPPY, coprocessorClassNames);
        return createTable(htd);
    }

    public static boolean  createTable( String tableName,String cf,boolean inMemory, int ttl ,int maxVersion ,  boolean useSNAPPY ,byte[][] splits, String ... coprocessorClassNames){
        HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, useSNAPPY ,coprocessorClassNames);
        return createTable(htd,splits );
    }
    public static boolean  createTable(String tableName, String cf, boolean inMemory, int ttl, int maxVersion, byte[][] splits, String ... coprocessorClassNames){

        HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, coprocessorClassNames);
        return createTable(htd,splits );
    }

    /**
     * 通过HTableDescriptor 和 分区 来构建hbase
     * @param htd
     * @param splits
     * @return
     */
    public static boolean createTable(HTableDescriptor htd, byte[][] splits){
        Admin admin = null;
        try {
            admin = conf.getHconnection().getAdmin();
            TableName tableName = htd.getTableName();
            boolean exist = admin.tableExists(tableName);
            if(exist){
                LOG.error("表"+tableName.getNameAsString() + "已经存在");
            }else{
                //使用Admin进行创建表
                admin.createTable(htd, splits);
            }
        } catch(IOException e )  {
            LOG.error("创建HBase表失败。", e);
            return false;
        }finally{
            try {
                if(admin!=null){
                    admin.close();
                }
            } catch (IOException e) {
                LOG.error("", e);
            }
        }
        return true;
    }

    public static boolean createTable(HTableDescriptor htd){
        Admin admin = null;
        try {
            admin = conf.getHconnection().getAdmin();
            if(admin.tableExists(htd.getTableName())){
                LOG.info("表" + htd.getTableName() + "已经存在");
            }else{
                admin.createTable(htd);
            }
        } catch(IOException e )  {
            LOG.error("创建HBase表失败。", e);
            return false;
        }finally{
            try {
                if(admin!=null){
                    admin.close();
                }
            } catch (IOException e) {
                LOG.error("", e);
            }
        }
        return true;
    }

    /**
     * 创建命名空间
     * @param nameSpace
     * @return
     */
    public static boolean createNameSpace(String nameSpace){

        Admin admin = null;
        try {
            admin = conf.getHconnection().getAdmin();
            NamespaceDescriptor[] listNamespaceDescriptors = admin.listNamespaceDescriptors();
            boolean exist = false;
            for(NamespaceDescriptor namespaceDescriptor : listNamespaceDescriptors){
                if(namespaceDescriptor.getName().equals(nameSpace)){
                    exist = true;
                }
            }
            if(!exist) admin.createNamespace(NamespaceDescriptor.create(nameSpace).build());
        } catch(IOException e )  {
            LOG.error("创建HBase命名空间失败。", e);
            return false;
        }finally{
            try {
                if(admin!=null){
                    admin.close();
                }
            } catch (IOException e) {
                LOG.error("", e);
            }
        }
        return true;
    }

    /**
     * 为 HBase中的表  tableName添加 协处理器  coprocessorClassName
     * @param tableName
     * @param coprocessorClassName    必须是已经存在与HBase集群中
     * @return  boolean
     */
    public static boolean addCoprocessorClassForTable(String tableName,String coprocessorClassName){

        boolean status = false;
        TableName name = TableName.valueOf(tableName);
        Admin admin = null;
        try {
            admin = conf.getHconnection().getAdmin();
            HTableDescriptor htd = admin.getTableDescriptor(name);
            if(!htd.hasCoprocessor(coprocessorClassName)){

                htd.addCoprocessor(coprocessorClassName);

                admin.disableTable(name);
                admin.modifyTable(name, htd);
                admin.enableTable(name);
            }else{
                LOG.warn(String.format("表 %s中已经存在协处理器%s", tableName, coprocessorClassName));
            }
            status = true;
        } catch (MasterNotRunningException e) {
            LOG.error("HBase  master  未运行 。 ", e);
        } catch (ZooKeeperConnectionException e) {
            LOG.error("zooKeeper 连接异常。 ", e);
        } catch (IOException e) {
            LOG.error("", e);
        }finally{
            try {
                if(admin!=null){
                    admin.close();
                }
            } catch (IOException e) {
                LOG.error("", e);
            }
        }
        return status;
    }

    /**
     * 为HBase中的表 tableName添加指定位置的 协处理器 jar
     * @param tableName
     * @param coprocessorClassName   jar中的具体的协处理器
     * @param jarPath     hdfs的路径
     * @param level       执行级别
     * @param kvs         运行参数    可以为 null
     * @return   boolean
     */
    public static boolean addCoprocessorJarForTable(String  tableName, String coprocessorClassName,String jarPath,int level ,Map<String, String> kvs ){
        boolean status = false;
        TableName name = TableName.valueOf(tableName);
        Admin admin = null;
        try {
            admin = conf.getHconnection().getAdmin();
            HTableDescriptor htd = admin.getTableDescriptor(name);
            if(!htd.hasCoprocessor(coprocessorClassName)){
                admin.disableTable(name);
                htd.addCoprocessor(coprocessorClassName, new Path(jarPath), level, kvs);
                admin.modifyTable(name, htd);
                admin.enableTable(name);
            }else{
                LOG.warn(String.format("表 %s中已经存在协处理器%s", tableName, coprocessorClassName));
            }
            status = true;
        } catch (MasterNotRunningException e) {
            LOG.error("HBase  master  未运行 。 ", e);
        } catch (ZooKeeperConnectionException e) {
            LOG.error("zooKeeper 连接异常。 ", e);
        } catch (IOException e) {
            LOG.error("", e);
        }finally{
            try {
                if(admin!=null){
                    admin.close();
                }
            } catch (IOException e) {
                LOG.error("", e);
            }
        }
        return status;
    }

    /**
     * @param tableName
     * @param cf
     * @param inMemory
     * @param ttl
     * @param maxVersion
     * @param coprocessorClassNames
     * @return
     */
    public static HTableDescriptor createHTableDescriptor( String tableName,String cf,boolean inMemory, int ttl ,int maxVersion ,String ... coprocessorClassNames ){
        return createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, true , COPROCESSORCLASSNAME);
    }

    /**
     * @param tableName
     * @param cf
     * @param inMemory
     * @param ttl
     * @param maxVersion
     * @param useSNAPPY
     * @param coprocessorClassNames
     * @return
     */
    public static HTableDescriptor createHTableDescriptor( String tableName,String cf,boolean inMemory, int ttl ,int maxVersion , boolean useSNAPPY , String ... coprocessorClassNames ){

        // 1.创建命名空间
        String[] split = tableName.split(":");
        if(split.length==2){
            createNameSpace(split[0]);
        }

        // 2.添加协处理器
        HTableDescriptor htd = new HTableDescriptor(TableName.valueOf(tableName));
        for( String coprocessorClassName : coprocessorClassNames ){

            try {
                htd.addCoprocessor(coprocessorClassName);
            } catch (IOException e1) {
                LOG.error("为表" + tableName + " 添加协处理器失败。 ", e1);
            }
        }

        // 创建HColumnDescriptor
        HColumnDescriptor hcd = new HColumnDescriptor(cf);
        if( maxVersion > 0 )
            //定义最大版本号
            hcd.setMaxVersions(maxVersion);

        /**
         * 设置布隆过滤器
         * 默认是NONE 是否使用布隆过虑及使用何种方式
         * 布隆过滤可以每列族单独启用
         * Default = ROW 对行进行布隆过滤。
         * 对 ROW，行键的哈希在每次插入行时将被添加到布隆。
         * 对 ROWCOL，行键 + 列族 + 列族修饰的哈希将在每次插入行时添加到布隆
         * 使用方法: create ‘table’,{BLOOMFILTER =>’ROW’}
         * 启用布隆过滤可以节省读磁盘过程，可以有助于降低读取延迟
         * */
        hcd.setBloomFilterType(BloomType.ROWCOL);

        /**
         * hbase在LRU缓存基础之上采用了分层设计，整个blockcache分成了三个部分，分别是single、multi和inMemory。三者区别如下：
         * single：如果一个block第一次被访问，放在该优先队列中；
         * multi：如果一个block被多次访问，则从single队列转移到multi队列
         * inMemory：优先级最高，常驻cache，因此一般只有hbase系统的元数据，如meta表之类的才会放到inMemory队列中。普通的hbase列族也可以指定IN_MEMORY属性，方法如下：
         * create 'table', {NAME => 'f', IN_MEMORY => true}
         * 修改上表的inmemory属性，方法如下：
         * alter 'table',{NAME=>'f',IN_MEMORY=>true}
         * */
        hcd.setInMemory(inMemory);
        hcd.setScope(1);

        /**
         * 数据量大，边压边写也会提升性能的，毕竟IO是大数据的最严重的瓶颈，
         * 哪怕使用了SSD也是一样。众多的压缩方式中，推荐使用SNAPPY。从压缩率和压缩速度来看，
         * 性价比最高。
         **/
        if(useSNAPPY)hcd.setCompressionType(Compression.Algorithm.SNAPPY);

        //默认为NONE
        //如果数据存储时设置了编码， 在缓存到内存中的时候是不会解码的，这样和不编码的情况相比，相同的数据块，编码后占用的内存更小， 即提高了内存的使用率
        //如果设置了编码，用户必须在取数据的时候进行解码， 因此在内存充足的情况下会降低读写性能。
        //在任何情况下开启PREFIX_TREE编码都是安全的
        //不要同时开启PREFIX_TREE和SNAPPY
        //通常情况下 SNAPPY并不能比 PREFIX_TREE取得更好的优化效果
        //hcd.setDataBlockEncoding(DataBlockEncoding.PREFIX_TREE);

        //默认为64k     65536
        //随着blocksize的增大， 系统随机读的吞吐量不断的降低，延迟也不断的增大，
        //64k大小比16k大小的吞吐量大约下降13%，延迟增大13%
        //128k大小比64k大小的吞吐量大约下降22%，延迟增大27%
        //对于随机读取为主的业务，可以考虑调低blocksize的大小

        //随着blocksize的增大， scan的吞吐量不断的增大，延迟也不断降低，
        //64k大小比16k大小的吞吐量大约增加33%，延迟降低24%
        //128k大小比64k大小的吞吐量大约增加7%，延迟降低7%
        //对于scan为主的业务，可以考虑调大blocksize的大小

        //如果业务请求以Get为主，则可以适当的减小blocksize的大小
        //如果业务是以scan请求为主，则可以适当的增大blocksize的大小
        //系统默认为64k, 是一个scan和get之间取的平衡值
        //hcd.setBlocksize(s)

        //设置表中数据的存储生命期，过期数据将自动被删除，
        // 例如如果只需要存储最近两天的数据，
        // 那么可以设置setTimeToLive(2 * 24 * 60 * 60)
        if( ttl < 0 ) ttl = HConstants.FOREVER;
        hcd.setTimeToLive(ttl);

        htd.addFamily( hcd);

        return htd;
    }

    public static boolean createTable(HBaseTableParam param){

        String nameSpace = param.getNameSpace();
        if(!"default".equalsIgnoreCase(nameSpace)){
            checkArgument(createNameSpace(nameSpace), String.format("创建命名空间%s失败。", nameSpace));
        }

        HTableDescriptor desc = createHTableDescriptor(param);
        byte[][] splits = param.getSplits();
        if(splits == null){
            return createTable(desc);
        }else{
            return createTable(desc, splits);
        }

    }

    public static HTableDescriptor createHTableDescriptor(HBaseTableParam param){

        String tableName = String.format("%s:%s", param.getNameSpace(), param.getTableName());
        HTableDescriptor htd = new HTableDescriptor(TableName.valueOf(tableName));

        for(String coprocessorClassName : param.getCoprocessorClazz()){
            try {
                htd.addCoprocessor(coprocessorClassName);
            } catch (IOException e) {
                LOG.error(String.format("为表  %s 添加协处理器失败。", tableName), e);
            }
        }

        HColumnDescriptor hcd = new HColumnDescriptor(param.getCf());
        hcd.setBloomFilterType(param.getBloomType());
        hcd.setMaxVersions(param.getMaxVersions());
        hcd.setScope(param.getReplicationScope());
        hcd.setBlocksize(param.getBlocksize());
        hcd.setInMemory(param.isInMemory());
        hcd.setTimeToLive(param.getTtl());

        /* 数据量大，边压边写也会提升性能的，毕竟IO是大数据的最严重的瓶颈，哪怕使用了SSD也是一样。众多的压缩方式中，推荐使用SNAPPY。从压缩率和压缩速度来看，性价比最高。  */
        if(param.isUsePrefix_tree())hcd.setDataBlockEncoding(DataBlockEncoding.PREFIX_TREE);
        if(param.isUseSnappy())hcd.setCompressionType(Compression.Algorithm.SNAPPY);

        htd.addFamily( hcd);

        return htd;
    }

    public static void closeTable( Table table ){

        if( table != null ){
            try {
                table.close();
            } catch (IOException e) {
                LOG.error(" ", e);
            }
            table = null;
        }
    }

    public static byte[][] getSplitKeys() {
        //String[] keys = new String[]{"50|"};
        //String[] keys = new String[]{"25|","50|","75|"};
        //String[] keys = new String[]{"13|","26|","39|", "52|","65|","78|","90|"};
        String[] keys = new String[]{ "06|","13|","20|", "26|","33|", "39|","46|", "52|","58|", "65|","72|","78|", "84|","90|","95|"};
        //String[] keys = new String[]{"10|", "20|", "30|", "40|", "50|", "60|", "70|", "80|", "90|"};
        byte[][] splitKeys = new byte[keys.length][];
        TreeSet<byte[]> rows = new TreeSet<byte[]>(Bytes.BYTES_COMPARATOR);//升序排序
        for (int i = 0; i < keys.length; i++) {
            rows.add(Bytes.toBytes(keys[i]));
        }
        Iterator<byte[]> rowKeyIter = rows.iterator();
        int i = 0;
        while (rowKeyIter.hasNext()) {
            byte[] tempRow = rowKeyIter.next();
            rowKeyIter.remove();
            splitKeys[i] = tempRow;
            i++;
        }
        return splitKeys;
    }

    public static class HBaseTableParam{

        private final String nameSpace; //命名空间
        private final String tableName; //表名
        private final String cf;        //列簇
        private Set<String>  coprocessorClazz = Sets.newHashSet("org.apache.hadoop.hbase.coprocessor.AggregateImplementation");
        private int maxVersions = 1;    //版本号 默认为1
        private BloomType bloomType = BloomType.ROWCOL;
        private boolean inMemory = false;
        private int replicationScope = 1;
        private boolean useSnappy = false; //默认不使用压缩
        private boolean usePrefix_tree = false;
        private int blocksize = 65536;
        private int ttl = HConstants.FOREVER;

        private byte[][] splits;

        public HBaseTableParam(String nameSpace, String tableName, String cf) {
            super();
            this.nameSpace = nameSpace == null ? "default" : nameSpace;
            this.tableName = tableName;
            this.cf = cf;
        }

        public String getNameSpace() {
            return nameSpace;
        }

        public String getTableName() {
            return tableName;
        }

        public String getCf() {
            return cf;
        }

        public Set<String> getCoprocessorClazz() {
            return coprocessorClazz;
        }

        public void clearCoprocessor(){
            coprocessorClazz.clear();
        }
        public void addCoprocessorClazz(String clazz) {
            this.coprocessorClazz.add(clazz);
        }

        public void addCoprocessorClazz(String ... clazz) {
            addCoprocessorClazz(Arrays.asList(clazz));
        }

        public void addCoprocessorClazz(Collection<String>  clazz) {
            this.coprocessorClazz.addAll(clazz);
        }

        public int getMaxVersions() {
            return maxVersions;
        }

        public void setMaxVersions(int maxVersions) {
            this.maxVersions = maxVersions <= 0 ? 1 : maxVersions;
        }

        public BloomType getBloomType() {
            return bloomType;
        }

        public void setBloomType(BloomType bloomType) {
            this.bloomType = bloomType == null ? BloomType.ROWCOL : bloomType;
        }

        public boolean isInMemory() {
            return inMemory;
        }

        public void setInMemory(boolean inMemory) {
            this.inMemory = inMemory;
        }

        public int getReplicationScope() {
            return replicationScope;
        }

        public void setReplicationScope(int replicationScope) {
            this.replicationScope = replicationScope < 0 ? 1 : replicationScope;
        }

        public boolean isUseSnappy() {
            return useSnappy;
        }

        /**
         * 控制是否使用 snappy 压缩数据， 默认是不启用
         * @param useSnappy
         */
        public void setUseSnappy(boolean useSnappy) {
            this.useSnappy = useSnappy;
        }

        public boolean isUsePrefix_tree() {
            return usePrefix_tree;
        }

        /**
         * 控制是否使用数据编码，默认是不使用
         *
         * 如果数据存储时设置了编码， 在缓存到内存中的时候是不会解码的，这样和不编码的情况相比，相同的数据块，编码后占用的内存更小， 即提高了内存的使用率
         * 如果设置了编码，用户必须在取数据的时候进行解码， 因此在内存充足的情况下会降低读写性能。
         * 在任何情况下开启PREFIX_TREE编码都是安全的
         * 不要同时开启PREFIX_TREE和SNAPPY
         * 通常情况下 SNAPPY并不能比 PREFIX_TREE取得更好的优化效果
         */
        public void setUsePrefix_tree(boolean usePrefix_tree) {
            this.usePrefix_tree = usePrefix_tree;
        }

        public int getBlocksize() {
            return blocksize;
        }

        /**
         *默认为64k     65536
         *随着blocksize的增大， 系统随机读的吞吐量不断的降低，延迟也不断的增大，
         *64k大小比16k大小的吞吐量大约下降13%，延迟增大13%
         *128k大小比64k大小的吞吐量大约下降22%，延迟增大27%
         *对于随机读取为主的业务，可以考虑调低blocksize的大小
         *
         *随着blocksize的增大， scan的吞吐量不断的增大，延迟也不断降低，
         *64k大小比16k大小的吞吐量大约增加33%，延迟降低24%
         *128k大小比64k大小的吞吐量大约增加7%，延迟降低7%
         *对于scan为主的业务，可以考虑调大blocksize的大小
         *
         *如果业务请求以Get为主，则可以适当的减小blocksize的大小
         *如果业务是以scan请求为主，则可以适当的增大blocksize的大小
         *系统默认为64k, 是一个scan和get之间取的平衡值
         *
         */
        public void setBlocksize(int blocksize) {
            this.blocksize = blocksize <= 0 ? 65536 : blocksize;
        }

        public int getTtl() {
            return ttl;
        }

        /**
         * 默认是永久保存
         * @param ttl  大于 零的整数，  <= 0 ? tt 为  永久保存
         */
        public void setTtl(int ttl) {
            this.ttl = ttl <= 0 ? HConstants.FOREVER : ttl;
        }

        public byte[][] getSplits() {
            return splits;
        }

        /*
         * 预分区的rowKey范围配置
         * @param splits
         */
        /*
        public void setSplits(byte[][] splits) {
            this.splits = splits;
        }*/
    }

    public static void main(String[] args) throws Exception{
        Admin admin = conf.getHconnection().getAdmin();
        System.out.println(admin);
        //deleteTable("test:user");
        // HBaseTableUtil.createTable("aaaaa","info1",true,-1,1);
        //  HBaseTableUtil.truncateTable("aaaaa");
     /*   boolean b = tableExists("test:user2");
        Table table = getTable("test:user2");
        System.out.println("=================="+table);
        System.out.println("=================="+table.getName());*/

        //HBaseTableUtil.deleteTable("aaaaa");

       /* Table table = HBaseTableUtil.getTable("countform:typecount");
        System.out.println(table);*/
/*
        boolean b = HBaseTableUtil.tableExists("countform:typecount");
        System.out.println(b);*/

        HBaseTableUtil.deleteTable("tanslator");
        HBaseTableUtil.deleteTable("ability");
        HBaseTableUtil.deleteTable("task");
        HBaseTableUtil.deleteTable("paper");

        //  HbaseSearchService hbaseSearchService=new HbaseSearchService();
        //  Map<String, String> stringStringMap = hbaseSearchService.get("countform:bsid","", new BaseMapRowExtrator());
        // Map<String, String> aaaaa = hbaseSearchService.get("countform:bsid", "aaaaa", new BaseMapRowExtrator());
        // System.out.println(aaaaa);
    }
}

com/hsiehchou/hbase/entity/AbstractRow.java

package com.hsiehchou.hbase.entity;

import com.google.common.collect.HashMultimap;
import com.google.common.collect.Sets;

import java.util.Collection;
import java.util.Map;
import java.util.Set;

public abstract class AbstractRow<T extends HBaseCell> {

    protected String rowKey;
    protected HashMultimap<String, T> cells;

    protected Set<String> fields;
    protected long maxCapTime;

    public AbstractRow(String rowKey){
        this.rowKey = rowKey;
        cells = HashMultimap.create();
        fields = Sets.newHashSet();
    }

    public boolean addCell(String field, String value, long capTime){

        return addCell(field, createCell(field, value, capTime));
    }

    public boolean addCell(String field, T cell){

        fields.add(cell.getField());

        if(cell.getCapTime() > maxCapTime)
            maxCapTime = cell.getCapTime();

        return cells.put(field, cell);
    }

    public boolean[] addCell(String field, Collection<T> cells){

        boolean[] status = new boolean[cells.size()];
        int n = 0;
        for(T cell : cells){
            status[n] = addCell(field, cell);
            n++;
        }
        return status;
    }

    public String getRowKey() {
        return rowKey;
    }

    protected abstract T createCell(String field, String value, long capTime);

    public Map<String, Collection<T>> getCell() {
        return cells.asMap();
    }

    public Collection<T> getCellByField(String field){
        return cells.get(field);
    }

    public Set<Map.Entry<String, T>> entries(){
        return  cells.entries();
    }

    @Override
    public String toString() {
        return "AbstractRow [rowKey=" + rowKey + ", cells=" + cells + "]";
    }

    public boolean equals(Object obj) {

       if(this == obj)return true ;
       if(!(obj instanceof AbstractRow))return false ;

       @SuppressWarnings("unchecked")
       AbstractRow<T> row = (AbstractRow<T>) obj;
       if(rowKey.equals(row.getRowKey()))return true;
       return false;
    }

    public int hashCode(){
        return this.rowKey.hashCode();
    }

    public long getMaxCapTime() {
        return maxCapTime;
    }

    public Set<String> getFields() {
        return Sets.newHashSet(fields);
    }
}

com/hsiehchou/hbase/entity/HBaseCell.java

package com.hsiehchou.hbase.entity;

public class HBaseCell implements Comparable<HBaseCell>{

    protected String field;           
    protected String value;
    protected Long capTime;

    public HBaseCell(String field, String value, long capTime){

        this.field = field;
        this.capTime = capTime;
        this.value = value;
    }

    public String getField(){
        return field;
    }

    public String getValue(){
        return value;
    }

    public void setCapTime(long capTime) {
        this.capTime = capTime;
    }

    public Long getCapTime() {
        return capTime;
    }

    public String toString(){
        return String.format("%s_[%s]_%s", field, capTime, value);
    }

    public int compareTo(HBaseCell o) {
        return o.getCapTime().compareTo(this.capTime);
    }

    public boolean equals(Object obj) {

       if(this == obj)return true ;
       if(!(obj instanceof HBaseCell))return false ;

       HBaseCell cell = (HBaseCell)obj;
       if(field.equals(cell.getField()) && value.equals(cell.getValue())){
           if(cell.getCapTime() < capTime){
               cell.setCapTime(this.capTime);
           }
           return true;
       }
       return false;
    }

    public int hashCode(){
        return this.field.hashCode() +  31*this.value.hashCode();
    }

}

com/hsiehchou/hbase/entity/HBaseRow.java

package com.hsiehchou.hbase.entity;

public class HBaseRow extends AbstractRow<HBaseCell> {

    public HBaseRow(String rowKey){
        super(rowKey);
    }

    public boolean[] addCell(String field, HBaseCell ... cells){

        boolean[] status = new boolean[cells.length];
        for(int i = 0; i < cells.length; i++){
            status[i] = addCell(field, cells[i]);
        }
        return status;
    }

    protected HBaseCell createCell(String field, String value, long capTime) {
        return new HBaseCell(field, value, capTime);
    }
}

com/hsiehchou/hbase/extractor/BaseListRowExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class BaseListRowExtrator implements RowExtractor<List<String>>{

    private List<String> row;

    public Long lastcjtime = 0l;

    public Long firstcjtime = 0l;

    @Override
    public List<String> extractRowData(Result result, int rowNum)
            throws IOException {

        row = new ArrayList<String>();
        for(Cell cell :  result.listCells()) {
            String column = Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength());
            String value = Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
            if(column.equalsIgnoreCase("cjtime")) {
                Long v = Long.parseLong(value);
                if(lastcjtime < v) {
                    lastcjtime = v;
                }else if(firstcjtime > v) {
                    firstcjtime = v;
                }
            }
            row.add(value);
        }
        return row;
    }
}

com/hsiehchou/hbase/extractor/BaseMapRowExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class BaseMapRowExtrator implements RowExtractor<Map<String,String>> {

    private Map<String,String> row;

    private List<byte[]> rows;
    private String longTimeField;
    private SimpleDateFormat format;

    private String field;
    private String value;

    private long time;

    public BaseMapRowExtrator(){}

    /**
     * @param rows   需要提取 所有的 rowKey  , null 则不提取
     */
    public BaseMapRowExtrator(List<byte[]> rows){
        this.rows = rows;
    }

    /**
     * @param rows             需要提取 所有的 rowKey  , null 则不提取
     * @param longTimeField    long类型的时间字段   表示需要将其转换称 String 类型
     */
    public BaseMapRowExtrator(List<byte[]> rows,String longTimeField){
        this.rows = rows;
        this.longTimeField = longTimeField;
    }

    /**
     * @param rows                  需要提取 所有的 rowKey  , null 则不提取
     * @param longTimeField         long类型的时间字段
     * @param timePattern           表示需要已该指定的格式  将时间字段的值转换成字符串
     */
    public BaseMapRowExtrator(List<byte[]> rows,String longTimeField,String timePattern){
        this.rows = rows;
        this.longTimeField = longTimeField;
        if(StringUtils.isNotBlank(timePattern)){
            format = new SimpleDateFormat(timePattern);
        }
    }

    public Map<String, String> extractRowData(Result result, int rowNum) throws IOException {

            row = new HashMap<String,String>();

            List<Cell> cells = result.listCells();
            for(Cell cell :  cells) {
                field = Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength());
                if( field.equals(longTimeField)  ){
                    time = Bytes.toLong(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
                    if( format != null ){
                        value = format.format(new Date(time));
                    }else{
                        value = String.valueOf(time);
                    }
                }else{
                    value = Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
                }
                row.put(field,value);
            }

            if( rows != null ){
                rows.add(result.getRow());
            }
        return row;
    }
}

com/hsiehchou/hbase/extractor/BaseMapWithRowKeyExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

public class BaseMapWithRowKeyExtrator implements RowExtractor<Map<String,String>> {

    private Map<String,String> row;

    /* (non-Javadoc)
     * @see com.bh.d406.bigdata.hbase.extractor.RowExtractor#extractRowData(org.apache.hadoop.hbase.client.Result, int)
     */
    @Override
    public Map<String, String> extractRowData(Result result, int rowNum)
            throws IOException {

        row = new HashMap<String,String>();
        row.put("rowKey", Bytes.toString( result.getRow() ));

        for(Cell cell :  result.listCells()) {
            row.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()),Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
        }
        return row;
    }
}

com/hsiehchou/hbase/extractor/BeanRowExtrator.java

package com.hsiehchou.hbase.extractor;

import com.google.common.collect.Maps;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.lang.reflect.Field;
import java.util.Map;

public class BeanRowExtrator<T> implements RowExtractor<T> {

    private static final Logger LOG = LoggerFactory.getLogger(BeanRowExtrator.class);

    private Class<T> clazz;
    private Map<String,Field> fieldMap;

    public BeanRowExtrator(Class<T> clazz){
        this.clazz = clazz;
        this.fieldMap = getDeclaredFields(clazz);
    }

    public T extractRowData(Result result, int rowNum) throws IOException {
        return resultReflectToClass(result, rowNum);
    }

    private T resultReflectToClass(Result result, int rowNum){
        String column = null;
        Field field = null;
        T obj = null;
        try {
            obj = clazz.newInstance();
            for(Cell cell : result.listCells()){
                column = Bytes.toString(cell.getQualifierArray(),
                        cell.getQualifierOffset(), cell.getQualifierLength());
                /*检查该列是否在实体类中存在对应的属性,若存在则 为其赋值*/
                if((field = fieldMap.get(column.toLowerCase())) != null){
                    field.set(obj, Bytes.toString(cell.getValueArray(),
                            cell.getValueOffset(), cell.getValueLength()));
                }
            }
        } catch (InstantiationException e) {
            LOG.error(String.format("解析第%个满足条件的记录%s失败。", rowNum, result), e);
        } catch (IllegalAccessException e) {
            LOG.error(String.format("解析第%s个满足条件的记录%s失败。", rowNum, result), e);
        }
        return obj;
    }

    private  Map<String,Field>  getDeclaredFields(Class<?> clazz){
        Field[] fields = clazz.getDeclaredFields();
        Field field = null;
        Map<String,Field> fieldMap = Maps.newHashMapWithExpectedSize(fields.length);

        for(int i = 0; i < fields.length; i++){
            field = fields[i];
            if(field.getModifiers() == 2){
                field.setAccessible(true);
                fieldMap.put(field.getName().toLowerCase(), field);
            }
        }
        fields = null;

        return fieldMap;
    }
}

com/hsiehchou/hbase/extractor/CellNumExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.client.Result;

import java.io.IOException;

public class CellNumExtrator implements RowExtractor<Integer> {

    public Integer extractRowData(Result result, int rowNum) throws IOException {
        return  result.listCells().size();
    }
}

com/hsiehchou/hbase/extractor/MapLongRowExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

public class MapLongRowExtrator implements RowExtractor<Map<String,Long>> {

    private Map<String,Long> row;

    @Override
    public Map<String, Long> extractRowData(Result result, int rowNum) throws IOException {

        row = new HashMap<String,Long>();

        for(Cell cell :  result.listCells()) {
            row.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()),Bytes.toLong(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
        }
        return row;
    }
}

com/hsiehchou/hbase/extractor/MapRowExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.io.Serializable;
import java.util.HashMap;
import java.util.Map;

public class MapRowExtrator implements RowExtractor<Map<String,String>>,Serializable {

    private static final long serialVersionUID = 1543027485077396235L;

    private Map<String,String> row;

    /* (non-Javadoc)
     * @see com.bh.d406.bigdata.hbase.extractor.RowExtractor#extractRowData(org.apache.hadoop.hbase.client.Result, int)
     */
    @Override
    public Map<String, String> extractRowData(Result result, int rowNum) throws IOException {

        row = new HashMap<String,String>();

        for(Cell cell :  result.listCells()) {
            row.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()),Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
        }
        return row;
    }
}

com/hsiehchou/hbase/extractor/MultiVersionRowExtrator.java

package com.hsiehchou.hbase.extractor;

import com.hsiehchou.hbase.entity.HBaseRow;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class MultiVersionRowExtrator implements RowExtractor<HBaseRow>{

    private HBaseRow row;

    public HBaseRow extractRowData(Result result, int rowNum) throws IOException {

        row = new HBaseRow(Bytes.toString(result.getRow()));

        String field = null;
        String value = null;
        long capTime = 0L;
        for(Cell cell : result.listCells()){
            field = Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength());
            value = Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
            capTime = cell.getTimestamp();

            row.addCell(field, value, capTime);
        }
        return  row ;
    }
}

com/hsiehchou/hbase/extractor/OneColumnRowByteExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.client.Result;

import java.io.IOException;
import java.io.Serializable;

public class OneColumnRowByteExtrator implements RowExtractor<byte[]> ,Serializable{

    private static final long serialVersionUID = -3420092335124240222L;

    private byte[] cf;
    private byte[] cl;

    public OneColumnRowByteExtrator( byte[] cf,byte[] cl ){
        this.cf = cf;
        this.cl = cl;
    }

    public byte[] extractRowData(Result result, int rowNum) throws IOException {
        return result.getValue(cf, cl);
    }
}

com/hsiehchou/hbase/extractor/OneColumnRowStringExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.io.Serializable;

public class OneColumnRowStringExtrator implements RowExtractor<String>  , Serializable{

    private static final long serialVersionUID = -8585637277902568648L;

    private byte[] cf ;
    private byte[] cl ;

    public OneColumnRowStringExtrator( byte[] cf , byte[] cl ){
        this.cf = cf;
        this.cl = cl;
    }

    /* (non-Javadoc)
     * @see com.bh.d406.bigdata.hbase.extractor.RowExtractor#extractRowData(org.apache.hadoop.hbase.client.Result, int)
     */
    @Override
    public String extractRowData(Result result, int rowNum) throws IOException {

        byte[] value = result.getValue(cf, cl);
        if( value == null ) return null;

        return  Bytes.toString( value ) ;
    }
}

com/hsiehchou/hbase/extractor/OnlyRowKeyExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.client.Result;

import java.io.IOException;

public class OnlyRowKeyExtrator implements RowExtractor<byte[]> {

    @Override
    public byte[] extractRowData(Result result, int rowNum) throws IOException {
        // TODO Auto-generated method stub
        return result.getRow();
    }
}

com/hsiehchou/hbase/extractor/OnlyRowKeyStringExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class OnlyRowKeyStringExtrator implements RowExtractor<String> {

    public String extractRowData(Result result, int rowNum) throws IOException {
        return Bytes.toString( result.getRow() );
    }
}

com/hsiehchou/hbase/extractor/RowExtractor.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.client.Result;

import java.io.IOException;

public interface RowExtractor<T>  {

    /**
      * description:
      * @param result  result解析器
      * @param rowNum  
      * @return
      * @throws Exception
      * T
     */
    T extractRowData(Result result, int rowNum) throws IOException;
}

com/hsiehchou/hbase/extractor/SingleColumnMultiVersionRowExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.Set;

public class SingleColumnMultiVersionRowExtrator implements RowExtractor<Set<String>>{

    private Set<String> values;
    private byte[] cf;
    private byte[] cl;


    /**
     * 单列解析器  获取hbase 单列多版本数据
     * @param cf     列簇
     * @param cl     列
     * @param values 返回值
     */
    public SingleColumnMultiVersionRowExtrator(byte[] cf, byte[] cl, Set<String> values){
        this.cf = cf;
        this.cl = cl;
        this.values = values;
    }

    public Set<String> extractRowData(Result result, int rowNum) throws IOException {

        for(Cell cell : result.getColumnCells(cf, cl)){
            values.add(Bytes.toString(cell.getValueArray(),cell.getValueOffset(), cell.getValueLength()));
        }
        return values;
    }

}

com/hsiehchou/hbase/extractor/StrToByteExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.io.Serializable;
import java.util.HashMap;
import java.util.Map;

public class StrToByteExtrator implements RowExtractor<Map<String,byte[]>> ,Serializable {

    private static final long serialVersionUID = 4633698173362569711L;

    private Map<String,byte[]> row;

    @Override
    public Map<String, byte[]> extractRowData(Result result, int rowNum) throws IOException {

        row = new HashMap<String,byte[]>();

        for(Cell cell :  result.listCells()) {
            row.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()),
                    Bytes.copy(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
        }
        return row;
    }
}

com/hsiehchou/hbase/extractor/ToRowList.java

package com.hsiehchou.hbase.extractor;

import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;

/**
 * Hbase数据库中数据提取接口实现：
 * 提取result的rowKey，和每个cell的值作为一行数据，
 * 一个cell=(row, family:qualifier:value, version)
 *
 * <p>
 * 每行数据的格式为：{rowKey column${separator}value column${separator}value ...}
 * 其中，不同的列之间用空格分隔，同样列元素的描述符与值之间用${separator}分隔
 */
public class ToRowList implements RowExtractor<List<String>> {

    private Boolean currentVersion; //currentVersion为true:只取当前最新版本，false:取所有版本
    private char separator; //不同元素之间拼接时的分隔符，默认为`#`

    private ToRowList(Boolean currentVersion, char separator) {
        this.separator = separator;
        this.currentVersion = currentVersion;
    }

    public ToRowList(Boolean currentVersion) {
        this(currentVersion, '#');
    }

    public ToRowList() {
        this(true, '#');
    }

    /**
      * 对{当前版本}存放在list[0] = {rowKey` `column`#`value` `column`#`value ...}
      * 多版本的时候list({rowKey`#`version1` `column`#`value` `column`#`value ...},
      * {rowKey`#`version2` `column`#`value` `column`#`value ...})
      */
    @Override
    public List<String> extractRowData(Result result, int rowNum) throws IOException {
        if(result == null || result.isEmpty()) return null;

        final char SPACE = ' ';

        List<String> rows = new LinkedList<>();

        //一个result是同一个rowKey的所有cells集合
        String rowKey = Bytes.toString(result.getRow());

        //build rowKey` `column`#`value` `column`#`value ...
        StringBuilder row = new StringBuilder();
        row.append(rowKey).append(SPACE);

        //用于处理不同版本的映射
        Map<Long, String> version2qualifiersAndValues = new HashMap<>();

        List<Cell> cells = result.listCells();
        for (Cell cell : cells) {
            String value = Bytes.toString(cell.getValueArray(),
                    cell.getValueOffset(), cell.getValueLength());
            String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));

            if (currentVersion) {
                row.append(qualifier).append(separator).append(value).append(SPACE);
            } else {
                Long version = cell.getTimestamp();
                String tmp = version2qualifiersAndValues.get(version);
                version2qualifiersAndValues.put(version,
                        StringUtils.isNotBlank(tmp) ? tmp + " " + qualifier + separator + value
                                : rowKey + separator + version + " " + qualifier + separator + value);
            }
        }

        if (currentVersion) {
            rows.add(row.toString());
        } else {
            for (String v : version2qualifiersAndValues.values()) {
                rows.add(v);
            }
        }

        return rows;
    }
}

com/hsiehchou/hbase/extractor/ToRowMap.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

/**
 * currentVersion 标识是否取多版本的数据，默认取当前版本
 * 对当前版本，返回row`#`qualifier->value的映射
 * 对多个版本，返回row`#`version`#`qualifier->value的映射
 */
public class ToRowMap implements RowExtractor<Map<String, String>> {

    private Boolean currentVersion;

    public ToRowMap() {
        this(true);
    }

    private ToRowMap(Boolean currentVersion) {
        this.currentVersion = currentVersion;
    }

    @Override
    public Map<String, String> extractRowData(Result result, int rowNum)
            throws IOException {
        if(result == null || result.isEmpty()) return null;

        final char HashTag = '#';

        HashMap<String, String> col2value = new HashMap<>();

        String rowKey = Bytes.toString(result.getRow());

        for (Cell cell : result.listCells()) {
            String value = Bytes.toString(cell.getValueArray(),
                    cell.getValueOffset(), cell.getValueLength());
            String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
            if (currentVersion)
                col2value.put(rowKey + HashTag + qualifier, value);
            else {
                long version = cell.getTimestamp();
                col2value.put(rowKey + HashTag + version + HashTag + qualifier, value);
            }
        }

        return col2value;
    }
}

com/hsiehchou/hbase/insert/HBaseInsertException.java

package com.hsiehchou.hbase.insert;

import java.util.Iterator;

public class HBaseInsertException extends Exception{
    public HBaseInsertException(String message) {
        super(message);
    }

    public final synchronized void addSuppresseds(Iterable<Exception> exceptions){

        if(exceptions != null){
            Iterator<Exception> iterator = exceptions.iterator();
            while (iterator.hasNext()){
                addSuppressed(iterator.next());
            }
        }
    }
}

com/hsiehchou/hbase/insert/HBaseInsertHelper.java

package com.hsiehchou.hbase.insert;

import com.hsiehchou.hbase.config.HBaseTableUtil;
import com.google.common.collect.Lists;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;

import java.io.Serializable;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

/**
 * 添加HBASE 插入数据类
 */
public class HBaseInsertHelper implements Serializable{

    private HBaseInsertHelper(){}

    public static void put(String tableName, Put put) throws Exception {
        put(tableName, Lists.newArrayList(put));
    }

    public static void put(String tableName, List<Put> puts) throws Exception {
        if(!puts.isEmpty()){
            Table table = HBaseTableUtil.getTable(tableName);
            try {
                table.put(puts);
            }catch (Exception e){
                e.printStackTrace();
            }finally {
                HBaseTableUtil.close(table);
            }
        }
     }

    public static void put(final String tableName, List<Put> puts, int perThreadPutSize) throws Exception {

        int size = puts.size();
        if(size > perThreadPutSize){

            int threadNum = (int)Math.ceil(size / (double)perThreadPutSize);
            ExecutorService executorService = Executors.newFixedThreadPool(threadNum);

            final CountDownLatch  cdl = new CountDownLatch(threadNum);
            final List<Exception>  es = Collections.synchronizedList(new ArrayList<Exception>());

            try {
                for(int i = 0; i < threadNum; i++){
                    final List<Put> tmp;
                    if(i == (threadNum - 1)){
                        tmp = puts.subList(perThreadPutSize*i, size);
                    }else{
                        tmp = puts.subList(perThreadPutSize*i, perThreadPutSize*(i + 1));
                    }
                    executorService.execute(new Runnable() {
                        public void run() {
                            try {
                                if(es.isEmpty()) put(tableName, tmp);
                            } catch (Exception e) {
                                es.add(e);
                            }finally {
                                cdl.countDown();
                            }
                        }
                    });
                }
                cdl.await();
            }finally {
                executorService.shutdown();
            }
            if(es.size() > 0){
                HBaseInsertException insertException = new HBaseInsertException(String.format("put数据到表%s失败。"));
                insertException.addSuppresseds(es);
                throw insertException;
            }
        }else {
            put(tableName, puts);
        }
    }

    public static void checkAndPut(String tableName, byte[] row, byte[] family, byte[] qualifier,
                                   byte[] value, Put put) throws Exception {
        checkAndPut(tableName, row, family, qualifier, null, value, put);
    }

    public static void checkAndPut(String tableName, byte[] row, byte[] family, byte[] qualifier,
                                   CompareOp compareOp, byte[] value, Put put) throws Exception {

        if(!put.isEmpty() ){
            Table table = HBaseTableUtil.getTable(tableName);
            try {
                if(compareOp == null){
                    table.checkAndPut(row, family, qualifier, value, put);
                }else{
                    table.checkAndPut(row, family, qualifier, compareOp, value, put);
                }
            }finally{
                HBaseTableUtil.close(table);
            }
        }
    }
}

com/hsiehchou/hbase/search/HBaseSearchService.java

package com.hsiehchou.hbase.search;

import com.hsiehchou.hbase.extractor.RowExtractor;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Scan;

import java.io.IOException;
import java.util.List;
import java.util.Map;


public interface HBaseSearchService {



    /**
      *  根据  用户 给定的解析类  解析  查询结果
      * @param tableName
      * @param scan
      * @param extractor  用户自定义的 结果解析 类
      * @return
      * @throws IOException
      * List<T>
     */
    <T> List<T> search(String tableName, Scan scan, RowExtractor<T> extractor) throws IOException;

    /**
      * 当存在多个  scan时  采用多线程查询
      * @param tableName
      * @param scans
      * @param extractor  用户自定义的 结果解析 类
      * @return
      * @throws IOException
      * List<T>
     */
    <T> List<T> searchMore(String tableName, List<Scan> scans, RowExtractor<T> extractor) throws IOException;

    /**
      * 采用多线程  同时查询多个表
      * @param more
      * @return
      * @throws IOException
      * List<T>
     */
    <T> Map<String,List<T>> searchMore(List<SearchMoreTable<T>> more) throws IOException;

    /**
      * 利用反射  自动封装实体类
      * @param tableName
      * @param scan    
      * @param cls   HBase表对应的实体类，属性只包含对应表的 列 ， 不区分大小写
      * @return
      * @throws IOException
      * @throws InstantiationException
      * @throws IllegalAccessException
      * List<T>
     */
    <T> List<T> search(String tableName, Scan scan, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException;

    /**
      * 当存在多个 scan 时  采用多线程查询
      * @param tableName
      * @param scans
      * @param cls   HBase表对应的实体类，属性只包含对应表的 列 ， 不区分大小写
      * @return
      * @throws IOException
      * @throws InstantiationException
      * @throws IllegalAccessException
      * List<T>
     */
    <T> List<T> searchMore(String tableName, List<Scan> scans, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException;


    /**
      * 批量 get 查询  并按自定义的方式解析结果集
      * @param tableName
      * @param gets
      * @param extractor  用户自定义的 结果解析 类
      * @return
      * @throws IOException
      * List<T>
     */
    <T> List<T> search(String tableName, List<Get> gets, RowExtractor<T> extractor) throws IOException;

    /**
      * 多线程批量get, 并按自定义的方式解析结果集
      * 建议 : perThreadExtractorGetNum >= 100
      * @param tableName
      * @param gets
      * @param perThreadExtractorGetNum    每个线程处理的 get的个数 
      * @param extractor  用户自定义的 结果解析 类
      * @return
      * @throws IOException
      * List<T>
     */
    <T> List<T> searchMore(String tableName, List<Get> gets, int perThreadExtractorGetNum, RowExtractor<T> extractor) throws IOException;

    /**
      * 批量 get 查询  并利用反射 封装到指定的实体类中
      * @param tableName
      * @param gets
      * @param  cls   HBase表对应的实体类，属性只包含对应表的 列 ， 不区分大小写
      * @return      
      * @throws IOException
      * @throws InstantiationException
      * List<T>
     */
    <T> List<T> search(String tableName, List<Get> gets, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException;

    /**
      * 多线程批量 get 查询  并利用反射 封装到指定的实体类中
      * 建议 : perThreadExtractorGetNum >= 100
      * @param tableName
      * @param gets
      * @param perThreadExtractorGetNum  每个线程处理的 get的个数 
      * @param  cls   HBase表对应的实体类，属性只包含对应表的 列 ， 不区分大小写
      * @return
      * @throws IOException
      * @throws InstantiationException
      * @throws IllegalAccessException
      * List<T>
     */
    <T> List<T> searchMore(String tableName, List<Get> gets, int perThreadExtractorGetNum, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException;

    /**
      * get 查询  并按自定义的方式解析结果集
      * @param tableName
      * @param extractor   用户自定义的 结果解析 类
      * @return     如果 查询不到  则 返回  null
      * @throws IOException
      * List<T>
     */
    <T> T search(String tableName, Get get, RowExtractor<T> extractor) throws IOException;

    /**
      * get 查询  并利用反射 封装到指定的实体类中
      * @param tableName
      * @param  cls   HBase表对应的实体类，属性只包含对应表的 列 ， 不区分大小写
      * @return     如果 查询不到  则 返回  null
      * @throws IOException
      * @throws InstantiationException
      * List<T>
     */
    <T> T search(String tableName, Get get, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException;

}

com/hsiehchou/hbase/search/HBaseSearchServiceImpl.java

package com.hsiehchou.hbase.search;

import com.hsiehchou.hbase.config.HBaseTableFactory;
import com.hsiehchou.hbase.extractor.RowExtractor;
import org.apache.hadoop.hbase.client.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Map;


public class HBaseSearchServiceImpl implements HBaseSearchService,Serializable{

    private static final long serialVersionUID = -8657479861137115645L;

    private static final Logger LOG = LoggerFactory.getLogger(HBaseSearchServiceImpl.class);

    private HBaseTableFactory factory = new HBaseTableFactory();
    private int poolCapacity = 6;


    @Override
    public <T> List<T> search(String tableName, Scan scan, RowExtractor<T> extractor) throws IOException {
        return null;
    }

    @Override
    public <T> List<T> searchMore(String tableName, List<Scan> scans, RowExtractor<T> extractor) throws IOException {
        return null;
    }

    @Override
    public <T> Map<String, List<T>> searchMore(List<SearchMoreTable<T>> more) throws IOException {
        return null;
    }

    @Override
    public <T> List<T> search(String tableName, Scan scan, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException {
        return null;
    }

    @Override
    public <T> List<T> searchMore(String tableName, List<Scan> scans, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException {
        return null;
    }

    @Override
    public <T> List<T> search(String tableName, List<Get> gets, RowExtractor<T> extractor) throws IOException {
        List<T> data = new ArrayList<T>();
        search(tableName, gets, extractor,data);
        return data;
    }

    @Override
    public <T> List<T> searchMore(String tableName, List<Get> gets, int perThreadExtractorGetNum, RowExtractor<T> extractor) throws IOException {
        return null;
    }

    @Override
    public <T> List<T> search(String tableName, List<Get> gets, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException {
        return null;
    }

    @Override
    public <T> List<T> searchMore(String tableName, List<Get> gets, int perThreadExtractorGetNum, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException {
        return null;
    }

    @Override
    public <T> T search(String tableName, Get get, RowExtractor<T> extractor) throws IOException {

        T obj = null;
        List<T> res = search(tableName,Arrays.asList(get),extractor);
        if( !res.isEmpty()){
            obj = res.get(0);
        }

        return obj;
    }

    @Override
    public <T> T search(String tableName, Get get, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException {
        return null;
    }

    private <T> void search(String tableName, List<Get> gets,
                            RowExtractor<T> extractor , List<T> data ) throws IOException {

        //根据table名获取表连接
        Table table = factory.getHBaseTableInstance(tableName);
        if(table != null ){
            Result[] results = table.get(gets);
            int n = 0;
            T row = null;
            for( Result result : results){
                if( !result.isEmpty() ){
                    row = extractor.extractRowData(result, n);
                    if(row != null )data.add(row);
                    n++;
                }
            }
            close( table, null);
        }else{
            throw new IOException(" table  " + tableName + " is not exists ..");
        }
    }

    public static boolean  existsRowkey( Table table, String rowkey){
        boolean exists =true;
        try {
            exists = table.exists(new Get(rowkey.getBytes()));
        } catch (IOException e) {
            LOG.error("失败。", e );
        }
        return exists;
    }

    public static void  close( Table table, ResultScanner scanner ){

        try {
            if( table != null ){
                table.close();
                table = null;
            }
            if( scanner != null ){
                scanner.close();
                scanner = null;
            }
        } catch (IOException e) {
            LOG.error("关闭 HBase的表  " + table.getName().toString() + " 失败。", e );
        }

    }
}

com/hsiehchou/hbase/search/SearchMoreTable.java

package com.hsiehchou.hbase.search;

import com.hsiehchou.hbase.extractor.RowExtractor;
import org.apache.hadoop.hbase.client.Scan;

public class SearchMoreTable<T> {

    private String tableName;
    private Scan scan;
    private RowExtractor<T> extractor;

    public SearchMoreTable() {
        super();
    }

    public SearchMoreTable(String tableName, Scan scan,
            RowExtractor<T> extractor) {
        super();
        this.tableName = tableName;
        this.scan = scan;
        this.extractor = extractor;
    }

    public String getTableName() {
        return tableName;
    }
    public void setTableName(String tableName) {
        this.tableName = tableName;
    }
    public Scan getScan() {
        return scan;
    }
    public void setScan(Scan scan) {
        this.scan = scan;
    }
    public RowExtractor<T> getExtractor() {
        return extractor;
    }
    public void setExtractor(RowExtractor<T> extractor) {
        this.extractor = extractor;
    }
}

com/hsiehchou/hbase/spilt/SpiltRegionUtil.java

package com.hsiehchou.hbase.spilt;

import org.apache.hadoop.hbase.util.Bytes;

import java.util.Iterator;
import java.util.TreeSet;

/**
 * hbase 预分区
 */
public class SpiltRegionUtil {

    /**
     * 定义分区
     * @return
     */
    public static byte[][] getSplitKeysBydinct() {

        String[] keys = new String[]{"1","2", "3","4", "5","6", "7","8", "9","a","b", "c","d","e","f"};
        //String[] keys = new String[]{"10|", "20|", "30|", "40|", "50|", "60|", "70|", "80|", "90|"};
        byte[][] splitKeys = new byte[keys.length][];

        //通过treeset排序
        TreeSet<byte[]> rows = new TreeSet<byte[]>(Bytes.BYTES_COMPARATOR);//升序排序
        for (int i = 0; i < keys.length; i++) {
            rows.add(Bytes.toBytes(keys[i]));
        }
        Iterator<byte[]> rowKeyIter = rows.iterator();
        int i = 0;
        while (rowKeyIter.hasNext()) {
            byte[] tempRow = rowKeyIter.next();
            rowKeyIter.remove();
            splitKeys[i] = tempRow;
            i++;
        }
        return splitKeys;
    }
}

6、执行

spark-submit --master local[1] --num-executors 1 --driver-memory 300m --executor-memory 500m --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ‘ ‘ ‘,’) --class com.hsiehchou.spark.streaming.kafka.kafka2hbase.DataRelationStreaming /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar

7、执行截图

hbase_list

hbase_scan

hbase写入数据

十二、SpringCloud 项目构建

SpringCloud微服务

服务注册

解决IntelliJ IDEA 创建Maven项目速度慢问题
add Maven Property
Name:archetypeCatalog
Value:internal

1、构建SpringCloud父项目

在原项目下新建 xz_bigdata_springcloud_dir目录

新建 xz_bigdata_springcloud_dir目录

2、在此目录下新建 xz_bigdata_springclod_root项目

新建 xz_bigdata_springcloud_root项目

3、引入SpringCloud依赖

父pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <modules>
    <module>xz_bigdata_springcloud_common</module>
    <module>xz_bigdata_springcloud_esquery</module>
    <module>xz_bigdata_springcloud_eureka</module>
    <module>xz_bigdata_springcloud_hbasequery</module>
  </modules>

  <parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.0.9.RELEASE</version>
  </parent>

  <groupId>com.hsiehchou.springcloud</groupId>
  <artifactId>xz_bigdata_springcloud_root</artifactId>
  <version>1.0-SNAPSHOT</version>
  <packaging>pom</packaging>

  <name>xz_bigdata_springcloud_root</name>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
  </properties>

  <!--CDH源-->
  <repositories>
    <repository>
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
  </repositories>
  <!--依赖管理，用于管理spring-cloud的依赖-->
  <dependencyManagement>
    <dependencies>
      <!--spring-cloud-dependencies-->
      <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-dependencies</artifactId>
        <version>Finchley.SR1</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>
    </dependencies>
  </dependencyManagement>
  <!--打包插件-->
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.1</version>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
          <encoding>UTF-8</encoding>
        </configuration>
      </plugin>
    </plugins>
  </build>
</project>

删除父项目src目录。因为这个项目主要是管理子项目不做任何逻辑业务

4、构建SpringCloud Common子项目

新建子模块
xz_bigdata_springcloud_common

引入依赖

pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata_springcloud_root</artifactId>
        <groupId>com.hsiehchou.springcloud</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_springcloud_common</artifactId>

    <name>xz_bigdata_springcloud_common</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>

    <dependencies>
        <!--eureka-server-->
        <!-- https://mvnrepository.com/artifact/org.springframework.cloud/spring-cloud-starter-eureka-server -->
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-starter-netflix-eureka-server</artifactId>
            <exclusions>
                <exclusion>
                    <artifactId>HdrHistogram</artifactId>
                    <groupId>org.hdrhistogram</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.24</version>
        </dependency>
    </dependencies>
</project>

5、构建Eureka服务注册中心

新建xz_bigdata_springcloud_eureka子模块

新建xz_bigdata_springcloud_eureka子模块

引入依赖

pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata_springcloud_root</artifactId>
        <groupId>com.hsiehchou.springcloud</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_springcloud_eureka</artifactId>

    <name>xz_bigdata_springcloud_eureka</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>

    <dependencies>
        <dependency>
            <groupId>com.hsiehchou.springcloud</groupId>
            <artifactId>xz_bigdata_springcloud_common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <!--用户验证-->
  <!--      <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-security</artifactId>
            <version>1.4.1.RELEASE</version>
        </dependency>-->
    </dependencies>


    <build>
        <plugins>
            <plugin><!--打包依赖的jar包-->
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <configuration>
                    <outputDirectory>${project.build.directory}/lib</outputDirectory>
                    <excludeTransitive>false</excludeTransitive> <!-- 表示是否不包含间接依赖的包 -->
                    <stripVersion>false</stripVersion> <!-- 去除版本信息 -->
                </configuration>

                <executions>
                    <execution>
                        <id>copy-dependencies</id>
                        <phase>package</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                        <configuration>
                            <!-- 拷贝项目依赖包到lib/目录下 -->
                            <outputDirectory>${project.build.directory}/jars</outputDirectory>
                            <excludeTransitive>false</excludeTransitive>
                            <stripVersion>false</stripVersion>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <!-- 打成jar包插件 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <version>2.5</version>
                <configuration>
                    <archive>
                        <!--
                        生成的jar中，不要包含pom.xml和pom.properties这两个文件
                    -->
                        <addMavenDescriptor>false</addMavenDescriptor>
                        <!-- 生成MANIFEST.MF的设置 -->
                        <manifest>
                            <!-- 为依赖包添加路径, 这些路径会写在MANIFEST文件的Class-Path下 -->
                            <addClasspath>true</addClasspath>
                            <classpathPrefix>jars/</classpathPrefix>
                            <!-- jar启动入口类-->
                            <mainClass>com.cn.hbase.mr.HbaseMr</mainClass>
                        </manifest>
                        <!--       <manifestEntries>
                                   &lt;!&ndash; 在Class-Path下添加配置文件的路径 &ndash;&gt;
                                   <Class-Path></Class-Path>
                               </manifestEntries>-->
                    </archive>
                    <outputDirectory>${project.build.directory}/</outputDirectory>
                    <includes>
                        <!-- 打jar包时，只打包class文件 -->
                        <include>**/*.class</include>
                        <include>**/*.properties</include>
                        <include>**/*.yml</include>
                    </includes>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

新建resources配置文件目录，添加application.yml配置文件或者 application.properties

application.yml

server:
  port: 8761
eureka:
  client:
    register-with-eureka: false
    fetch-registry: false
    service-url:
      defaultZone: http://root:root@hadoop3:8761/eureka/

xz_bigdata_springcloud_eureka结构

新建EurekaApplication 启动类

EurekaApplication.java

package com.hsiehchou.springcloud.eureka;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.netflix.eureka.server.EnableEurekaServer;

/**
 * 注册中心
 */
@SpringBootApplication
@EnableEurekaServer
public class EurekaApplication
{
    public static void main( String[] args )
    {
        SpringApplication.run(EurekaApplication.class, args);
    }
}

执行EurekaApplication 启动

访问localhost:8761

访问hadoop3:8761

6、构建HBase查询服务模块

新建xz_bigdata_springcloud_root子模块

新建xz_bigdata_springcloud_root子模块

添加依赖

pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata_springcloud_root</artifactId>
        <groupId>com.hsiehchou.springcloud</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_springcloud_hbasequery</artifactId>

    <name>xz_bigdata_springcloud_hbasequery</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>

    <dependencies>
        <!--spring common依赖-->
        <dependency>
            <groupId>com.hsiehchou.springcloud</groupId>
            <artifactId>xz_bigdata_springcloud_common</artifactId>
            <version>1.0-SNAPSHOT</version>
            <exclusions>
                <exclusion>
                    <artifactId>HdrHistogram</artifactId>
                    <groupId>org.hdrhistogram</groupId>
                </exclusion>
            </exclusions>
        </dependency>
        <!--基础服务hbase依赖-->
        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_hbase</artifactId>
            <version>1.0-SNAPSHOT</version>
            <exclusions>
                <exclusion>
                    <artifactId>fastjson</artifactId>
                    <groupId>com.alibaba</groupId>
                </exclusion>
            </exclusions>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin><!--打包依赖的jar包-->
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <configuration>
                    <outputDirectory>${project.build.directory}/lib</outputDirectory>
                    <excludeTransitive>false</excludeTransitive> <!-- 表示是否不包含间接依赖的包 -->
                    <stripVersion>false</stripVersion> <!-- 去除版本信息 -->
                </configuration>

                <executions>
                    <execution>
                        <id>copy-dependencies</id>
                        <phase>package</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                        <configuration>
                            <!-- 拷贝项目依赖包到lib/目录下 -->
                            <outputDirectory>${project.build.directory}/jars</outputDirectory>
                            <excludeTransitive>false</excludeTransitive>
                            <stripVersion>false</stripVersion>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <!-- 打成jar包插件 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <version>2.5</version>
                <configuration>
                    <archive>
                        <!--
                        生成的jar中，不要包含pom.xml和pom.properties这两个文件
                    -->
                        <addMavenDescriptor>false</addMavenDescriptor>
                        <!-- 生成MANIFEST.MF的设置 -->
                        <manifest>
                            <!-- 为依赖包添加路径, 这些路径会写在MANIFEST文件的Class-Path下 -->
                            <addClasspath>true</addClasspath>
                            <classpathPrefix>jars/</classpathPrefix>
                            <!-- jar启动入口类-->
                            <mainClass>com.cn.hbase.mr.HbaseMr</mainClass>
                        </manifest>
                        <!--       <manifestEntries>
                                   &lt;!&ndash; 在Class-Path下添加配置文件的路径 &ndash;&gt;
                                   <Class-Path></Class-Path>
                               </manifestEntries>-->
                    </archive>
                    <outputDirectory>${project.build.directory}/</outputDirectory>
                    <includes>
                        <!-- 打jar包时，只打包class文件 -->
                        <include>**/*.class</include>
                        <include>**/*.properties</include>
                        <include>**/*.yml</include>
                    </includes>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

添加配置文件

新建 resources 目录
添加 application.properties 文件

server.port=8002

logging.level.root=INFO
logging.level.org.hibernate=INFO
logging.level.org.hibernate.type.descriptor.sql.BasicBinder=TRACE
logging.level.org.hibernate.type.descriptor.sql.BasicExtractor= TRACE
logging.level.com.itmuch=DEBUG
spring.http.encoding.charset=UTF-8
spring.http.encoding.enable=true
spring.http.encoding.force=true

eureka.client.serviceUrl.defaultZone=http://root:root@hadoop3:8761/eureka/

spring.application.name=xz-bigdata-springcloud-hbasequery
eureka.instance.prefer-ip-address=true

构建启动类

新建 com.hsiehchou.springcloud.hbase包
构建 HbaseApplication 启动类

package com.hsiehchou.springcloud;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.netflix.eureka.server.EnableEurekaServer;

@SpringBootApplication
@EnableEurekaServer
public class HbaseQueryApplication
{
    public static void main( String[] args )
    {
        SpringApplication.run(HbaseQueryApplication.class, args);
    }
}

说明注册成功

构建服务

构建Hbase服务

构建 com.hsiehchou.springcloud.hbase.controller

创建 HbaseBaseController

HbaseBaseController.java

package com.hsiehchou.springcloud.hbase.controller;

import com.hsiehchou.hbase.extractor.SingleColumnMultiVersionRowExtrator;
import com.hsiehchou.hbase.search.HBaseSearchService;
import com.hsiehchou.hbase.search.HBaseSearchServiceImpl;
import com.hsiehchou.springcloud.hbase.service.HbaseBaseService;
import org.apache.hadoop.hbase.client.Get;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.*;

import javax.annotation.Resource;
import java.io.IOException;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

@Controller
@RequestMapping(value="/hbase")
public class HbaseBaseController {

    private static Logger LOG = LoggerFactory.getLogger(HbaseBaseController.class);


    //注入 通过这个注解可以直接拿到HbaseBaseService这个的实例
    @Resource
    private HbaseBaseService hbaseBaseService;

    @ResponseBody
    @RequestMapping(value="/search/{table}/{rowkey}", method={RequestMethod.GET,RequestMethod.POST})
    public Set<String> search(@PathVariable(value = "table") String table,
                              @PathVariable(value = "rowkey") String rowkey){
        return hbaseBaseService.getSingleColumn(table,rowkey);
    }

    @ResponseBody
    @RequestMapping(value="/search1", method={RequestMethod.GET,RequestMethod.POST})
    public Set<String> search1( @RequestParam(name = "table") String table,
                                @RequestParam(name = "rowkey") String rowkey){
        //通过二级索引去找主关联表的rowkey 这个rowkey就是MAC
        return hbaseBaseService.getSingleColumn(table,rowkey);
    }

    @ResponseBody
    @RequestMapping(value = "/getHbase",method = {RequestMethod.GET,RequestMethod.POST})
    public Set<String> getHbase(@RequestParam(name="table") String table,
                                @RequestParam(name="rowkey") String rowkey){
        return hbaseBaseService.getSingleColumn(table, rowkey);
    }

    @ResponseBody
    @RequestMapping(value = "/getRelation",method = {RequestMethod.GET,RequestMethod.POST})
    public Map<String,List<String>> getRelation(@RequestParam(name = "field") String field,
                                                @RequestParam(name = "fieldValue") String fieldValue){
        return hbaseBaseService.getRealtion(field,fieldValue);
    }

    public static void main(String[] args) {
        HbaseBaseController hbaseBaseController = new HbaseBaseController();
        hbaseBaseController.getHbase("send_mail", "65497873@qq.com");
    }
}

构建 com.hsiehchou.springcloud.hbase.service

创建 HbaseBaseService

HbaseBaseService.java

package com.hsiehchou.springcloud.hbase.service;

import com.hsiehchou.hbase.entity.HBaseCell;
import com.hsiehchou.hbase.entity.HBaseRow;
import com.hsiehchou.hbase.extractor.MultiVersionRowExtrator;
import com.hsiehchou.hbase.extractor.SingleColumnMultiVersionRowExtrator;
import com.hsiehchou.hbase.search.HBaseSearchService;
import com.hsiehchou.hbase.search.HBaseSearchServiceImpl;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Put;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Service;

import javax.annotation.Resource;
import java.io.IOException;
import java.util.*;

@Service
public class HbaseBaseService {
    private static Logger LOG = LoggerFactory.getLogger(HbaseBaseService.class);

    @Resource
    private HbaseBaseService hbaseBaseService;

    /**
     * 获取hbase单列数据的多版本信息
     * @param field
     * @param rowkey
     * @return
     */
    public Set<String> getSingleColumn(String field,String rowkey){
        //从索引表中获取总关联表的rowkey,获取phone对应的多版本MAC
        Set<String> search = null;
        HBaseSearchService hBaseSearchService = new HBaseSearchServiceImpl();
        String table = "test:"+field;
        Get get = new Get(rowkey.getBytes());
        try {
            get.setMaxVersions(100);
        } catch (IOException e) {
            e.printStackTrace();
        }
        Set set = new HashSet<String>();
        SingleColumnMultiVersionRowExtrator singleColumnMultiVersionRowExtrator = new SingleColumnMultiVersionRowExtrator("cf".getBytes(), "phone_mac".getBytes(), set);

        try {
            search = hBaseSearchService.search(table, get, singleColumnMultiVersionRowExtrator);
            System.out.println(search.toString());
        } catch (IOException e) {
            e.printStackTrace();
        }
        return search;
    }

    /**
     *  获取单列多版本
     * @param table
     * @param rowkey
     * @param versions
     * @return
     */
    public Set<String> getSingleColumn(String table,String rowkey,int versions){
        Set<String> search = null;
        try {
            HBaseSearchService baseSearchService = new HBaseSearchServiceImpl();
            Get get = new Get(rowkey.getBytes());
            get.setMaxVersions(versions);
            Set set = new HashSet<String>();
            SingleColumnMultiVersionRowExtrator singleColumnMultiVersionRowExtrator = new SingleColumnMultiVersionRowExtrator("cf".getBytes(), "phone_mac".getBytes(), set);
            search = baseSearchService.search(table, get, singleColumnMultiVersionRowExtrator);
        } catch (IOException e) {
            LOG.error(null,e);
        }
        System.out.println(search);
        return search;
    }

    /**
     * 直接通过关联表字段值获取整条记录
     * hbase 二级查找
     * @param field
     * @param fieldValue
     * @return
     */
    public Map<String,List<String>> getRealtion(String field,String fieldValue){

        //第一步 从二级索引表中找到多版本的rowkey
        Map<String,List<String>> map = new HashMap<>();

        //首先查找索引表
        //查找的表名
        String table = "test:" + field;
        String indexRowkey = fieldValue;
        HbaseBaseService hbaseBaseService = new HbaseBaseService();
        Set<String> relationRowkeys = hbaseBaseService.getSingleColumn(table, indexRowkey, 100);

        //第二步 拿到二级索引表中得到的 主关联表的rowkey
        //对这些rowkey进行遍历 获取主关联表中rowkey对应的所有多版本数据

        //遍历relationRowkeys，将其封装成List<Get>
        List<Get> list = new ArrayList<>();
        relationRowkeys.forEach(relationRowkey->{
            //通过relationRowkey去找relation表中的所有信息
            Get get = new Get(relationRowkey.getBytes());
            try {
                get.setMaxVersions(100);
            } catch (IOException e) {
                e.printStackTrace();
            }
            list.add(get);
        });

        MultiVersionRowExtrator multiVersionRowExtrator = new MultiVersionRowExtrator();
        HBaseSearchService hBaseSearchService = new HBaseSearchServiceImpl();

        try {
            //<T> List<T> search(String tableName, List<Get> gets, RowExtractor<T> extractor) throws IOException;

            List<HBaseRow> search = hBaseSearchService.search("test:relation", list, multiVersionRowExtrator);
            search.forEach(hbaseRow->{
                Map<String, Collection<HBaseCell>> cellMap = hbaseRow.getCell();
                cellMap.forEach((key,value)->{
                    //把Map<String,Collection<HBaseCell>>转为Map<String,List<String>>
                    List<String> listValue = new ArrayList<>();
                    value.forEach(x->{
                        listValue.add(x.toString());
                    });
                    map.put(key,listValue);
                });
            });
        } catch (IOException e) {
            e.printStackTrace();
        }
        System.out.println(map.toString());
     return map;
    }

    public static void main(String[] args) {
        HbaseBaseService hbaseBaseService = new HbaseBaseService();
//        hbaseBaseService.getRealtion("send_mail","65494533@qq.com");
        hbaseBaseService.getSingleColumn("phone","18609765012");
    }
}

7、构建ES查询服务

使用jest API 是走的 HTTP 请求 9200端口
依赖如下:

<dependency>
    <groupId>io.searchbox</groupId>
    <artifactId>jest</artifactId>
    <version>6.3.1</version>
</dependency>

9200作为Http协议，主要用于外部通讯

9300作为Tcp协议，jar之间就是通过 tcp协议通讯

ES集群之间是通过9300进行通讯

新建xz_bigdata_springcloud_esquery

新建xz_bigdata_springcloud_esquery子项目

准备

新建 resources 配置文件目录

增加配置文件

application.properties

server.port=8003

logging.level.root=INFO
logging.level.org.hibernate=INFO
logging.level.org.hibernate.type.descriptor.sql.BasicBinder=TRACE
logging.level.org.hibernate.type.descriptor.sql.BasicExtractor= TRACE
logging.level.com.itmuch=DEBUG
spring.http.encoding.charset=UTF-8
spring.http.encoding.enable=true
spring.http.encoding.force=true

eureka.client.serviceUrl.defaultZone=http://root:root@hadoop3:8761/eureka/

spring.application.name=xz-bigdata-springcloud-esquery
eureka.instance.prefer-ip-address=true


#关闭EDES检测
management.health.elasticsearch.enabled=false

spring.elasticsearch.jest.uris=["http://192.168.116.201:9200"]


#全部索引
esIndexs=wechat,mail,qq

新建ES微服务启动类

ESqueryApplication.java

package com.hsiehchou.springcloud.es;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.client.discovery.EnableDiscoveryClient;
import org.springframework.cloud.netflix.eureka.server.EnableEurekaServer;
import org.springframework.cloud.openfeign.EnableFeignClients;

@SpringBootApplication
@EnableEurekaServer
@EnableDiscoveryClient
@EnableFeignClients
public class ESqueryApplication {
    public static void main(String[] args) {
        SpringApplication.run(ESqueryApplication.class,args);
    }
}

启动 Eureka ES 微服务

说明注册成功

ES调用Hbase

构建 com.hsiehchou.springcloud.es.controller

创建 EsBaseController

package com.hsiehchou.springcloud.es.controller;

import com.hsiehchou.springcloud.es.feign.HbaseFeign;
import com.hsiehchou.springcloud.es.service.EsBaseService;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.ResponseBody;

import javax.annotation.Resource;
import java.util.List;
import java.util.Map;
import java.util.Set;

@Controller
@RequestMapping(value = "/es")
public class EsBaseController {


    @Value("${esIndexs}")
    private String esIndexs;

    @Resource
    private EsBaseService esBaseService;

    @Resource
    private HbaseFeign hbaseFeign;

    /**
     * 基础查询
     * @param indexName
     * @param typeName
     * @param sortField
     * @param sortValue
     * @param pageNumber
     * @param pageSize
     * @return
     */
    @ResponseBody
    @RequestMapping(value = "/getBaseInfo", method = {RequestMethod.GET, RequestMethod.POST})
    public List<Map<String, Object>> getBaseInfo(@RequestParam(name = "indexName") String indexName,
                                                 @RequestParam(name = "typeName") String typeName,
                                                 @RequestParam(name = "sortField") String sortField,
                                                 @RequestParam(name = "sortValue") String sortValue,
                                                 @RequestParam(name = "pageNumber") int pageNumber,
                                                 @RequestParam(name = "pageSize") int pageSize) {
        // 根据数据类型, 排序，分页
        // indexName typeName
        // sortField sortValue
        // pageNumber  pageSize
        return  esBaseService.getBaseInfo(indexName,typeName,sortField,sortValue,pageNumber,pageSize);
    }


    /**
     * 根据任意条件查找轨迹数据
     * @param field
     * @param fieldValue
     * @return
     */
    @ResponseBody
    @RequestMapping(value = "/getLocus", method = {RequestMethod.GET, RequestMethod.POST})
    public List<Map<String, Object>> getLocus(@RequestParam(name = "field") String field,
                                                 @RequestParam(name = "fieldValue") String fieldValue) {

        Set<String> macs = hbaseFeign.search1(field, fieldValue);
        System.out.println(macs.toString());
        // 根据数据类型, 排序，分页
        // indexName typeName
        // sortField sortValue
        // pageNumber  pageSize
        String mac = macs.iterator().next();

        return  esBaseService.getLocus(mac);
    }

    /**
     * 所有表数据总量
     * @return
     */
    @ResponseBody
    @RequestMapping(value="/getAllCount", method={RequestMethod.GET,RequestMethod.POST})
    public Map<String,Long> getAllCount(){
        Map<String, Long> allCount = esBaseService.getAllCount(esIndexs);
        System.out.println(allCount);
        return allCount;
    }

    @ResponseBody
    @RequestMapping(value="/group", method={RequestMethod.GET,RequestMethod.POST})
    public Map<String,Long> group(@RequestParam(name = "indexName") String indexName,
                                  @RequestParam(name = "typeName") String typeName,
                                  @RequestParam(name = "field") String field){
        return esBaseService.aggregation(indexName,typeName,field);
    }


    public static void main(String[] args){
        EsBaseController esBaseController = new EsBaseController();
        esBaseController.getLocus("phone","18609765432");
    }
}

构建 com.hsiehchou.springcloud.es.service

创建 EsBaseService

package com.hsiehchou.springcloud.es.service;

import com.hsiehchou.es.jest.service.JestService;
import com.hsiehchou.es.jest.service.ResultParse;
import io.searchbox.client.JestClient;
import io.searchbox.core.SearchResult;
import org.springframework.stereotype.Service;

import java.util.HashMap;
import java.util.List;
import java.util.Map;

@Service
public class EsBaseService {

    // 根据数据类型, 排序，分页
    // indexName typeName
    // sortField sortValue
    // pageNumber  pageSize
    public List<Map<String, Object>> getBaseInfo(String indexName,
                                                 String typeName,
                                                 String sortField,
                                                 String sortValue,
                                                 int pageNumber,
                                                 int pageSize) {
        //实现查询
        JestClient jestClient = null;
        List<Map<String, Object>> maps = null;
        try {
            jestClient = JestService.getJestClient();
            SearchResult search = JestService.search(jestClient,
                    indexName,
                    typeName,
                    "",
                    "",
                    sortField,
                    sortValue,
                    pageNumber,
                    pageSize);
            maps = ResultParse.parseSearchResultOnly(search);
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            JestService.closeJestClient(jestClient);
        }
        return maps;
    }


    // 传时间范围   比如你要查3天之内的轨迹
    // es中text的类型的可以直接查询，而keyword类型的必须带.keyword，例如，phone_mac.keyword
    public List<Map<String, Object>> getLocus(String mac){
        //实现查询
        JestClient jestClient = null;
        List<Map<String, Object>> maps = null;
        String[] includes = new String[]{"latitude","longitude","collect_time"};
        try {
            jestClient = JestService.getJestClient();
            SearchResult search = JestService.search(jestClient,
                    "",
                    "",
                    "phone_mac.keyword",
                    mac,
                    "collect_time",
                    "asc",
                    1,
                    2000,
                    includes);
            maps = ResultParse.parseSearchResultOnly(search);
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            JestService.closeJestClient(jestClient);
        }
        return maps;
    }


     public Map<String,Long> getAllCount(String esIndexs){

        Map<String,Long> countMap = new HashMap<>();
        JestClient jestClient = null;
        try {
            jestClient = JestService.getJestClient();
            String[] split = esIndexs.split(",");
            for (int i = 0; i < split.length; i++) {
                String index = split[i];
                Long count = JestService.count(jestClient, index, index);
                countMap.put(index,count);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }finally {
            JestService.closeJestClient(jestClient);
        }
        return countMap;
    }

    public Map<String,Long> aggregation(String indexName,String typeName,String field){

        JestClient jestClient = null;
        Map<String, Long> stringLongMap = null;
        try {
            jestClient = JestService.getJestClient();
            SearchResult aggregation = JestService.aggregation(jestClient, indexName, typeName, field);
            stringLongMap = ResultParse.parseAggregation(aggregation);
        } catch (Exception e) {
            e.printStackTrace();
        }finally {
            JestService.closeJestClient(jestClient);
        }
        return stringLongMap;
    }
}

这里用到了ES的大数据基础服务

轨迹查询

用到了 HBase 的服务，使用 Fegin
SpringCloud Feign

Feign 是一个声明式的伪Http客户端，它使得写Http客户端变得更简单。使用 Feign ，只需要创建一个接口并用注解的方式来配置它，即可完成对服务提供方的接口绑定服务调用客户端的开发量。

构建 com.hsiehchou.springcloud.es.fegin

创建 HbaseFeign

package com.hsiehchou.springcloud.es.feign;

import org.springframework.cloud.openfeign.FeignClient;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.ResponseBody;

import java.util.Set;

@FeignClient(name = "xz-bigdata-springcloud-hbasequery")
public interface HbaseFeign {

    @ResponseBody
    @RequestMapping(value="/hbase/search1", method=RequestMethod.GET)
    public Set<String> search1(@RequestParam(name = "table") String table,
                               @RequestParam(name = "rowkey") String rowkey);
}

8、微服务手动部署

Maven添加打包插件

 <build>
        <plugins>
            <plugin><!--打包依赖的jar包-->
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <configuration>
                    <outputDirectory>${project.build.directory}/lib</outputDirectory>
                    <excludeTransitive>false</excludeTransitive> <!-- 表示是否不包含间接依赖的包 -->
                    <stripVersion>false</stripVersion> <!-- 去除版本信息 -->
                </configuration>

                <executions>
                    <execution>
                        <id>copy-dependencies</id>
                        <phase>package</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                        <configuration>
                            <!-- 拷贝项目依赖包到lib/目录下 -->
                            <outputDirectory>${project.build.directory}/jars</outputDirectory>
                            <excludeTransitive>false</excludeTransitive>
                            <stripVersion>false</stripVersion>
                        </configuration>
                    </execution>
                </executions>
            </plugin>


            <!-- 打成jar包插件 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <version>2.4</version>
                <configuration>
                    <archive>
                        <!--
                        生成的jar中，不要包含pom.xml和pom.properties这两个文件
                    -->
                        <addMavenDescriptor>false</addMavenDescriptor>
                        <!-- 生成MANIFEST.MF的设置 -->
                        <manifest>
                            <!-- 为依赖包添加路径, 这些路径会写在MANIFEST文件的Class-Path下 -->
                            <addClasspath>true</addClasspath>
                            <classpathPrefix>jars/</classpathPrefix>
                            <!-- jar启动入口类-->
                            <mainClass>com.cn.hbase.mr.HbaseMr</mainClass>
                        </manifest>
                        <!--       <manifestEntries>
                                   &lt;!&ndash; 在Class-Path下添加配置文件的路径 &ndash;&gt;
                                   <Class-Path></Class-Path>
                               </manifestEntries>-->
                    </archive>
                    <outputDirectory>${project.build.directory}/</outputDirectory>
                    <includes>
                        <!-- 打jar包时，只打包class文件 -->
                        <include>**/*.class</include>
                        <include>**/*.properties</include>
                        <include>**/*.yml</include>
                    </includes>
                </configuration>
            </plugin>
        </plugins>
    </build>

因为微服务依赖 xz_bigdata2 所以先打包xz_bigdata2

修改配置文件

defaultZone: http://root:root@hadoop3:8761/eureka/

将注册中心 IP 改为部署服务器的IP
微服务同理

上面给出的配置文件已经修改好了

部署

先部署Erueka服务中心
新建/usr/chl/springcloud/eureka

部署地方

上传jars 和jar

eureka

启动服务中心
eureka服务注册中心启动

nohup java -cp /usr/chl/springcloud/eureka/xz_bigdata_springcloud_eureka-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.eureka.EurekaApplication &

查看日志

tail -f nohup.out

部署esquery
esquery微服务启动

nohup java -cp /usr/chl/springcloud/esquery/xz_bigdata_springcloud_esquery-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.es.ESqueryApplication &

部署hbasequery
hbasequery微服务启动

nohup java -cp /usr/chl/springcloud/hbasequery/xz_bigdata_springcloud_hbasequery-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.HbaseQueryApplication &

9、执行

hadoop3:8002/hbase/getRelation?field=phone&fieldValue=18609765012
hadoop3:8002/hbase/search1?table=phone&rowkey=18609765012
hadoop3:8002/hbase/getHbase?table=send_mail&rowkey=65497873@qq.com
hadoop3:8002/hbase/getHbase?table=phone&rowkey=18609765012
hadoop3:8002/hbase/search/phone/18609765012
hadoop3:8003/es/getAllCount
hadoop3:8003/es/getBaseInfo
hadoop3:8003/es/getLocus
hadoop3:8003/es/group

十三、附录

1、测试数据

mail_source1_1111101.txt

000000000000011    000000000000011    23.000011    24.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300088    65497873@qq.com    1789090763    11111111@qq.com    1789097863    今天出去打球吗    send
000000000000011    000000000000011    24.000011    25.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300085    65497873@qq.com    1789090764    22222222@qq.com    1789097864    今天出去打球吗    send
000000000000011    000000000000011    23.000011    24.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300088    65497873@qq.com    1789090763    33333333@qq.com    1789097863    今天出去打球吗    send
000000000000011    000000000000011    24.000011    25.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300085    65497873@qq.com    1789090764    44444444@qq.com    1789097864    今天出去打球吗    send
000000000000000    000000000000000    23.000001    24.000001    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305988    1323243@qq.com    1789098763    43432543@qq.com    1789098863    今天出去打球吗    send
000000000000000    000000000000000    24.000001    25.000001    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305985    1323243@qq.com    1789098764    43432543@qq.com    1789098864    今天出去打球吗    send
000000000000000    000000000000000    23.000001    24.000001    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305988    1323243@qq.com    1789098763    43432543@qq.com    1789098863    今天出去打球吗    send
000000000000000    000000000000000    24.000001    25.000001    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305985    1323243@qq.com    1789098764    43432543@qq.com    1789098864    今天出去打球吗    send

qq_source1_1111101.txt

000000000000000    000000000000000    23.000000    24.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305988    andiy    18609765432    judy            1789098762
000000000000000    000000000000000    24.000000    25.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305985    andiy    18609765432    judy            1789098763
000000000000000    000000000000000    23.000000    24.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305988    andiy    18609765432    judy            1789098762
000000000000000    000000000000000    24.000000    25.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305985    andiy    18609765432    judy            1789098763
000000000000011    000000000000011    23.000011    24.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300388    xz    18609765012    ls            1789000653
000000000000011    000000000000011    24.000011    25.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300545    xz    18609765012    ls            1789000343
000000000000011    000000000000011    23.000011    24.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300658    xz    18609765012    ls            1789000542
000000000000011    000000000000011    24.000011    25.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300835    xz    18609765012    ls            1789000263
000000000000011    000000000000011    23.000021    24.000031    1c-31-5d-b1-6f-3f    3y-5g-g6-du-bv-2f    32109246    1557300388    xz    18609765016    ls            1789001653
000000000000011    000000000000011    24.000021    25.000031    1c-31-5d-b1-6f-3f    3y-5g-g6-du-bv-2f    32109246    1557302235    xz    18609765016    ls            1789001343
000000000000011    000000000000011    23.000021    24.000031    1c-31-5d-b1-6f-3f    3y-5g-g6-du-bv-2f    32109246    1557303658    xz    18609765016    ls            1789001542
000000000000011    000000000000011    24.000021    25.000031    1c-31-5d-b1-6f-3f    3y-5g-g6-du-bv-2f    32109246    1557303835    xz    18609765016    ls            1789001263
000000000000011    000000000000011    23.000031    24.000041    4c-6f-c7-3d-a4-3d    9g-gd-3h-3k-ld-3f    32109246    1557300001    xz    18609765014    ls            1789050653
000000000000011    000000000000011    24.000031    25.000051    7c-8e-d4-a6-3d-5c    54-hg-gi-yx-ef-ge    32109246    1557300005    xz    18609765015    ls            1789070343
000000000000011    000000000000011    23.000031    24.000061    8c-g1-ed-7b-5f-1b    47-fy-vv-hs-ue-fd    32109246    1557300008    xz    18609765017    ls            1789080542
000000000000011    000000000000011    24.000031    25.000071    0c-76-2a-b1-3c-1a    f5-nw-hf-ud-ht-ea    32109246    1557300115    xz    18609765010    ls            1789082263

wechat_source1_1111101.txt

000000000000000    000000000000000    23.000000    24.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305988    andiy    18609765432    judy            1789098762
000000000000000    000000000000000    24.000000    25.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305985    andiy    18609765432    judy            1789098763
000000000000000    000000000000000    23.000000    24.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305988    andiy    18609765432    judy            1789098762
000000000000000    000000000000000    24.000000    25.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305985    andiy    18609765432    judy            1789098763
000000000000011    000000000000011    23.000011    24.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300388    xz    18609765012    ls            1789000653
000000000000011    000000000000011    24.000011    25.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300545    xz    18609765012    ls            1789000343
000000000000011    000000000000011    23.000011    24.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300658    xz    18609765012    ls            1789000542
000000000000011    000000000000011    24.000011    25.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300835    xz    18609765012    ls            1789000263
000000000000011    000000000000011    23.000021    24.000031    1c-31-5d-b1-6f-3f    3y-5g-g6-du-bv-2f    32109246    1557300388    xz    18609765016    ls            1789001653
000000000000011    000000000000011    24.000021    25.000031    1c-31-5d-b1-6f-3f    3y-5g-g6-du-bv-2f    32109246    1557302235    xz    18609765016    ls            1789001343
000000000000011    000000000000011    23.000021    24.000031    1c-31-5d-b1-6f-3f    3y-5g-g6-du-bv-2f    32109246    1557303658    xz    18609765016    ls            1789001542
000000000000011    000000000000011    24.000021    25.000031    1c-31-5d-b1-6f-3f    3y-5g-g6-du-bv-2f    32109246    1557303835    xz    18609765016    ls            1789001263
000000000000011    000000000000011    23.000031    24.000041    4c-6f-c7-3d-a4-3d    9g-gd-3h-3k-ld-3f    32109246    1557300001    xz    18609765014    ls            1789050653
000000000000011    000000000000011    24.000031    25.000051    7c-8e-d4-a6-3d-5c    54-hg-gi-yx-ef-ge    32109246    1557300005    xz    18609765015    ls            1789070343
000000000000011    000000000000011    23.000031    24.000061    8c-g1-ed-7b-5f-1b    47-fy-vv-hs-ue-fd    32109246    1557300008    xz    18609765017    ls            1789080542
000000000000011    000000000000011    24.000031    25.000071    0c-76-2a-b1-3c-1a    f5-nw-hf-ud-ht-ea    32109246    1557300115    xz    18609765010    ls            1789082263

2、Kafka

创建topic，1个副本3个分区
kafka-topics –zookeeper hadoop1:2181 –topic chl_test7 –create –replication-factor 1 –partitions 3

删除topic
kafka-topics –zookeeper hadoop1:2181 –delete –topic chl_test7

列出所有的topic
kafka-topics –zookeeper hadoop1:2181 –list

消费
kafka-console-consumer –bootstrap-server hadoop1:9092 –topic chl_test7 –from-beginning

3、kafka2es

启动sparkstreaming任务

spark-submit --master yarn-cluster --num-executors 1 --driver-memory 500m --executor-memory 1g --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ' ' ',') --class com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar chl_test7 chl_test7

spark-submit 
--master yarn-cluster    //集群启动
--num-executors 1        //分配多少个进程
--driver-memory 500m  //driver内存
--executor-memory 1g //进程内存
--executor-cores 1       //开多少个核，线程
--jars $(echo /usr/chl/spark8/jars/*.jar | tr ' ' ',') //加载jar
--class com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming //执行类 /usr/chl/spark8/xz_bigdata_spark-1.0-SNAPSHOT.jar //包的位置

4、Yarn

将yarn的执行日志输出
yarn logs -applicationId application_1561627166793_0002 > log.log

查看日志
more log.log

cat log.log

5、CDH的7180打不开

查看cloudera-scm-server状态
service cloudera-scm-server status

查看cloudera-scm-server 日志
cat /var/log/cloudera-scm-server/cloudera-scm-server.log

重启cloudera-scm-server
service cloudera-scm-server restart

6、CDH的jdk设置—重要

/usr/local/jdk1.8

7、预警

spark-submit --master local[1] --num-executors 1 --driver-memory 300m --executor-memory 500m --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ' ' ',') --class com.hsiehchou.spark.streaming.kafka.warn.WarningStreamingTask /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar

8、Kibana的DEV Tools

GET _search
{
  "query": {
    "match_all": {}
  }
}

GET  _cat/indices

DELETE tanslator_test1111

DELETE qq
DELETE wechat
DELETE mail

GET wechat

GET mail

GET _search

GET mail/_search

GET mail/_mapping

PUT mail

PUT mail/mail/_mapping
{
  "_source": {
    "enabled": true
  },
  "properties": {
    "imei":{"type": "keyword"},
    "imsi":{"type": "keyword"},
    "longitude":{"type": "double"},
    "latitude":{"type": "double"},
    "phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "collect_time":{"type": "long"},
    "send_mail":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "send_time":{"type": "long"},
    "accept_mail":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "accept_time":{"type": "long"},
    "mail_content":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "mail_type":{"type": "keyword"},
     "id":{"type": "keyword"},
    "table":{"type": "keyword"},
    "filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
  }
}

GET qq/_search

GET qq/_mapping

PUT qq

PUT qq/qq/_mapping
{
  "_source": {
    "enabled": true
  },
  "properties": {
    "imei":{"type": "keyword"},
    "imsi":{"type": "keyword"},
    "longitude":{"type": "double"},
    "latitude":{"type": "double"},
    "phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "collect_time":{"type": "long"},
    "username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "phone":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "object_username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "send_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "accept_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "message_time":{"type": "long"},
    "id":{"type": "keyword"},
    "table":{"type": "keyword"},
    "filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
  }
}

GET wechat/_search

GET wechat/_mapping

PUT wechat

PUT wechat/wechat/_mapping
{
  "_source": {
    "enabled": true
  },
  "properties": {
    "imei":{"type": "keyword"},
    "imsi":{"type": "keyword"},
    "longitude":{"type": "double"},
    "latitude":{"type": "double"},
    "phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "collect_time":{"type": "long"},
    "username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "phone":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "object_username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "send_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "accept_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "message_time":{"type": "long"},
    "id":{"type": "keyword"},
    "table":{"type": "keyword"},
    "filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
  }
}

9、Hive

kafka写入hive

spark-submit --master local[1] --num-executors 1 --driver-memory 300m --executor-memory 500m --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ' ' ',') --class com.hsiehchou.spark.streaming.kafka.kafka2hdfs.Kafka2HiveTest /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar

show tables;

hdfs dfs -ls /apps/hive/warehouse/external

hdfs dfs -rm -r /apps/hive/warehouse/external/mail

drop table mail;

desc qq;

select * from qq limit 1;

注意了：cdh的hive版本跟其对应的spark版本不一致的话此处执行不了
select count(*) from qq;

合并小文件

crontab -e

0 1 * * * spark-submit --master local[1] --num-executors 1 --driver-memory 300m --executor-memory 500m --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ' ' ',') --class com.hsiehchou.spark.streaming.kafka.kafka2hdfs.CombineHdfs /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar

定时任务crontab

10、Zookeeper

启动zookeeper客户端
zookeeper-client

清除消费者
rmr /consumers/WarningStreamingTask2/offsets

rmr /consumers/Kafka2HiveTest/offsets

rmr /consumers/DataRelationStreaming1/offsets

11、Hbase

spark-submit --master local[1] --num-executors 1 --driver-memory 500m --executor-memory 1g --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ' ' ',') --class com.hsiehchou.spark.streaming.kafka.kafka2hbase.DataRelationStreaming /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar

hbase shell

list

create 't1','cf'

desc 't1'

put 't1','aa-aa-aa-aa-aa-aa','cf:qq','66666666'

put 't1','aa-aa-aa-aa-aa-aa','cf:weixin','weixin1'

put 't1','aa-aa-aa-aa-aa-aa','cf:mail','66666@qq.com'

scan 't1'

将表变成多版本
alter 't1',{NAME=>'cf',VERSIONS=>50}

put 't1','aa-aa-aa-aa-aa-aa','cf:qq','77777777'

get 't1','aa-aa-aa-aa-aa-aa',{COLUMN=>'cf',VERSIONS=>10}

put 't1','aa-aa-aa-aa-aa-aa','cf:qq','55555555'

put 't1','aa-aa-aa-aa-aa-aa','cf:qq','88888888',1290300544

执行DataRelationStreaming
scan 'test:relation'

get 'test:username','andiy'

scan 'test:relation'

mail 
改mac 邮箱

get  'test:relation','',{COLUMN=>'cf',VERSIONS=>10}

disable 'test:imei'
drop 'test:imei'

disable 'test:imsi'
drop 'test:imsi'

disable 'test:phone'
drop 'test:phone'

disable 'test:phone_mac'
drop 'test:phone_mac'

disable 'test:relation'
drop 'test:relation'

disable 'test:send_mail'
drop 'test:send_mail'

disable 'test:username'
drop 'test:username'

12、SpringCloud

eureka服务注册中心启动

nohup java -cp /usr/chl/springcloud/eureka/xz_bigdata_springcloud_eureka-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.eureka.EurekaApplication &

查看日志

tail -f nohup.out

esquery微服务启动

nohup java -cp /usr/chl/springcloud/esquery/xz_bigdata_springcloud_esquery-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.es.ESqueryApplication &

hbasequery微服务启动

nohup java -cp /usr/chl/springcloud/hbasequery/xz_bigdata_springcloud_hbasequery-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.HbaseQueryApplication &

谢舟

https://blog.hsiehchou.com/2019/07/27/qi-ye-wang-luo-ri-zhi-fen-xi/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源谢舟 !

大数据项目网络日志分析

IDEA中jar冲突查找快捷键快速定位

Ctrl+Alt+Shift+N

2019-07-27 工具

IDEA

Eureka服务注册中心启动

nohup启动（nohup不间断运行） nohup java -cp /usr/chl/springcloud/eureka/xz_bigdata_springcloud_eureka-1.0-SNAPSHOT.jar com.hsiehc

2019-07-25 SpringCloud

SpringCloud Eureka nohup

一、背景数据介绍

二．基础架构搭建

1、创建Maven父项

2、项目整体结构

3、创建子模块

三、Common开发

1、config/ConfigUtil.java—配置文件读取

2、config/JsonReader.java

3、adjuster/Adjuster.java—数据调整接口

4、adjuster/StringAdjuster.java

5、file/FileCommon.java

6、filter—数据过滤顶层接口

7、net/HttpRequest.java

8、netb/db/DBCommon—mysql的连接、关闭基础类

9、project/datatype/DataTypeProperties.java

10、regex/Validation.java—验证工具类

11、thread/ThreadPoolManager.java—线程池管理器单例

12、time/TimeTranstationUtils.java—时间转换工具类

四、Resources开发

xz_bigdata_resources结构

1、resources下面

2、common

3、es

4、flume

5、hadoop

6、hbase

7、kafka

8、redis

9、spark

五．Flume开发

1、maven冲突解决和pom.xml

2、自定义source

3、自定义interceptor—数据清洗过滤器

4、utils工具类

5、constant常量

6、field字段

7、自定义sink

8、service

9、配置CDH上的Agent文件—-跟FolderSource等里面读取配置文件相对应

10、flume打包到服务器执行

六、Kafka开发

1、pom.xml

2、config/KafkaConfig.java—kafka配置文件 解析器

3、producer/StringProducer.java—生产者

七、Spark—kafka2es开发

Cloudera查找对应的maven依赖地址

Sparkstreming + kafka receiver模式理解

Sparkstreming + kafka receiver模式理解

1、spark下的pom.xml

2、spark中的文件结构

3、xz_bigdata_spark/spark/common

4、org/apache/spark/streaming/kafka/KafkaManager.scala

5、resources/log4j.properties

6、xz_bigdata_spark/spark/streaming/kafka

7、kafka2es

8、ES动态索引创建

9、CDH的java配置和Elasticsearch的配置

10、kafka2es打包到集群执行

11、运行截图

12、冲突查找快捷键

八、xz_bigdata_es开发

1、pom.xml

2、admin

3、client

4、jest/service

5、search

6、utils

7、V2

九、预警

1、创建规则表（由界面控制规则发布）

2、创建消息表

3、创建数据库连接工具类

4、创建实体类和dao

5、告警工具类

6、创建redis子项目

7、创建定时任务，将规则同步到redis

8、创建streaming流任务

9、执行

10、截图

2、config/KafkaConfig.java—kafka配置文件解析器

3、引入SpringCloud依赖