企业网络日志分析


一、背景数据介绍

1. WiFi有哪些数据?
手机号
机构
机构
机构
网页快照
论坛帖子
微博
邮件
IM聊天
表单数据
APP使用

2. WiFi价值
客户体验:方便客户、基础设施
客户数据:精准营销、获取客户上网行为、获取客户信息、客户接触渠道

3. WiFi数据获取
Wi-Fi 网络可以捕获附近智能手机的 IMSI 号码,无线跟踪并监控用户的根源在于智能手机(包括 Android 和 iOS 设备)连接 Wi-Fi 网络的方式。

在大多数现代移动操作系统中有两种广泛实现的协议:
可扩展认证协议(EAP)
认证和密钥协商(AKA)协议

这些协议允许智能手机通过自身设备的 IMSI 号码切换登录到已知的 Wi-Fi 网络,实现 WiFi 网络自动连接而无需所有者交互。

4. wifi数据应用
wifi数据应用

画像系统
画像系统

5. 数据架构
网络用户行为数据架构图

6. 数据结构
(1) 文件命名
数据类型_来源_UUID.txt
如BASE_SOURCE_UUID.txt

定一套字段标准 ,类型标准
(2) 字段
(3) 通用字段

参数1 参数2 参数3 参数4
imei imei号,手机唯一识别码 手机IMEI码由15-17位数字组成
imsi IMSI,SIM卡唯一识别码 460011418603055 14-15位数字
longitude 经度 精确到小数点6位
latitude 纬度 精确到小数点6位
phone_mac 手机MAC 格式需要统一(清洗)aa-aa-aa-aa-aa-aa(范围1-9,a-f)
device_mac 采集设备MAC 格式需要统一(清洗)aa-aa-aa-aa-aa-aa(范围任意数字加字母)
device_number 采集设备号
collect_time collect_time

微信数据(wechat)

参数1 参数2 参数3 参数4
username 微信昵称
phone 手机号
object_username 对方微信号
send_message 发送内容(不能破解)
accept_message 接收内容(不能破解)
message_time 通信时间

邮箱数据(Mail)

参数1 参数2 参数3 参数4
send_mail 发送邮箱
send_time 发送时间
accept_mail 接收邮箱
accept_time 接收时间
mail_content 发送内容
mail_type 发送还是接收 send accept

搜索数据(Search)

参数1 参数2 参数3 参数4
search_content 搜索内容
search_url 搜索URL
search_type 搜索引擎
search_time 搜索时间

基础数据(Base)

参数1 参数2 参数3 参数4
name 姓名
is_marry 是否已婚
phone 手机号
address 户籍所在地
address_new 现在居住地址
birthday 出生日期
car_number 车牌号
idcard 身份证

问题:数据结构,数据字段如何确定?
根据实际的需求自己确定。

二.基础架构搭建

1、创建Maven父项

总的pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.hsiehchou</groupId>
  <artifactId>xz_bigdata2</artifactId>
  <packaging>pom</packaging>
  <version>1.0-SNAPSHOT</version>
  <modules>
    <module>xz_bigdata_common</module>
    <module>xz_bigdata_es</module>
    <module>xz_bigdata_flume</module>
    <module>xz_bigdata_hbase</module>
    <module>xz_bigdata_kafka</module>
    <module>xz_bigdata_redis</module>
    <module>xz_bigdata_resources</module>
    <module>xz_bigdata_spark</module>
  </modules>

  <name>xz_bigdata2</name>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
    <cdh.version>cdh5.14.0</cdh.version>
    <junit.version>4.12</junit.version>
    <org.slf4j.version>1.7.5</org.slf4j.version>
    <zookeeper.version>3.4.5</zookeeper.version>
    <scala.version>2.10.5</scala.version>
  </properties>

  <repositories>
    <repository>
      <id>Akka repository</id>
      <url>https://repo.akka.io/releases</url>
    </repository>
    <!--cloudera依赖-->
    <repository>
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
  </repositories>

  <!--日志依赖-->
  <dependencies>
    <dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-api</artifactId>
      <version>${org.slf4j.version}</version>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.1</version>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
          <encoding>UTF-8</encoding>
        </configuration>
      </plugin>
    </plugins>
  </build>
</project>

2、项目整体结构

xz_bigdata2整体结构

3、创建子模块

选中xz_bigdata2,右键选择Module,新建maven子模块,上面图中的那些模块都是这样创建的。
注意:开发时使用jdk1.8以上版本,里面使用了jdk1.8特有的内容,低版本开发是报错的,使用jdk1.8方便开发。

ctrl+shift+alt+s:打开Project Structure里面可以进行操作。

ctrl+alt+s:打开Settings,可以配置本地Maven(在Build,Execution,Deployment下面的Build Tools下面的Maven配置自己的本地Maven仓库路径)。

Settings里面还可以看见之前说的Plugins,安装插件,Maven Helper以及后面的Scala插件都可以这里安装。

三、Common开发

pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata2</artifactId>
        <groupId>com.hsiehchou</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_common</artifactId>

    <name>xz_bigdata_common</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <ant.version>1.9.1</ant.version>
        <jaxen.version>1.1.6</jaxen.version>
        <guava.version>12.0.1</guava.version>
        <dom4j.version>1.6.1</dom4j.version>
        <fastjson.version>1.2.5</fastjson.version>
        <disruptor.version>3.3.6</disruptor.version>
        <org.slf4j.version>1.7.5</org.slf4j.version>
        <commons.io.version>2.4</commons.io.version>
        <httpclient.version>4.2.5</httpclient.version>
        <commons.exec.version>1.3</commons.exec.version>
        <commons.lang.version>2.4</commons.lang.version>
        <commons-vfs2.version>2.1</commons-vfs2.version>
        <commons.math3.version>3.4.1</commons.math3.version>
        <commons.logging.version>1.2</commons.logging.version>
        <commons-httpclient.version>3.1</commons-httpclient.version>
        <commons.collections4.version>4.1</commons.collections4.version>
        <commons.configuration.version>1.6</commons.configuration.version>
        <mysql.connector.version>5.1.46</mysql.connector.version>
        <commons-dbutils.version>1.6</commons-dbutils.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>commons-dbutils</groupId>
            <artifactId>commons-dbutils</artifactId>
            <version>${commons-dbutils.version}</version>
        </dependency>

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.46</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_resources</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>${org.slf4j.version}</version>
        </dependency>

        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>${org.slf4j.version}</version>
        </dependency>

        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>${commons.io.version}</version>
        </dependency>

        <dependency>
            <groupId>commons-lang</groupId>
            <artifactId>commons-lang</artifactId>
            <version>${commons.lang.version}</version>
        </dependency>

        <dependency>
            <groupId>commons-configuration</groupId>
            <artifactId>commons-configuration</artifactId>
            <version>${commons.configuration.version}</version>
        </dependency>

        <dependency>
            <groupId>dom4j</groupId>
            <artifactId>dom4j</artifactId>
            <version>${dom4j.version}</version>
        </dependency>

        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>${fastjson.version}</version>
        </dependency>

        <!-- <dependency>
             <groupId>log4j</groupId>
             <artifactId>log4j</artifactId>
             <version>1.2.17</version>
         </dependency>-->
    </dependencies>

</project>

1、config/ConfigUtil.java—配置文件读取

package com.hsiehchou.common.config;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;

public class ConfigUtil {

    private static Logger LOG = LoggerFactory.getLogger(ConfigUtil.class);

    private static ConfigUtil configUtil;

    public static ConfigUtil getInstance(){

        if(configUtil == null){
            configUtil = new ConfigUtil();
        }
        return configUtil;
    }

    public Properties getProperties(String path){
        Properties properties = new Properties();
        try {
            LOG.info("开始加载配置文件" + path);
            //流式读取配置文件
            InputStream insss = this.getClass().getClassLoader().getResourceAsStream(path);
            properties = new Properties();
            properties.load(insss);
        } catch (IOException e) {
            LOG.info("加载配置文件" + path + "失败");
            LOG.error(null,e);
        }

        LOG.info("加载配置文件" + path + "成功");
        System.out.println("文件内容:"+properties);
        return properties;
    }

    public static void main(String[] args) {
        ConfigUtil instance = ConfigUtil.getInstance();
        Properties properties = instance.getProperties("common/datatype.properties");
        //Properties properties = instance.getProperties("spark/relation.properties");

       // properties.get("relationfield");
        System.out.println(properties);
    }
}

2、config/JsonReader.java

package com.hsiehchou.common.config;

import org.apache.commons.io.FileUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.File;

public class JsonReader {
    private static Logger LOG = LoggerFactory.getLogger(JsonReader.class);

    public static String readJson(String json_path){
        JsonReader jsonReader = new JsonReader();
        return jsonReader.getJson(json_path);
    }

    private String getJson(String json_path){
        String jsonStr = "";
        try {
            String path = getClass().getClassLoader().getResource(json_path).toString();
            path = path.replace("\\", "/");
            if (path.contains(":")) {
                path = path.replace("file:/","");
            }
            jsonStr = FileUtils.readFileToString(new File(path), "UTF-8");
            LOG.error("读取json文件{}成功",path);
        } catch (Exception e) {
            LOG.error("读取json文件失败",e);
        }
        return jsonStr;
    }
}

3、adjuster/Adjuster.java—数据调整接口

package com.hsiehchou.common.adjuster;

/**
 * 数据调整接口
 */
public interface Adjuster<T, E> {
    E doAdjust(T data);
}

4、adjuster/StringAdjuster.java

package com.hsiehchou.common.adjuster;

public abstract class StringAdjuster<E> implements Adjuster<String, E> {}

5、file/FileCommon.java

package com.hsiehchou.common.file;

import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.net.URL;
import java.util.List;

public class FileCommon {

    private FileCommon(){}

    /**
     * 判断文件是否存在
     * @param name
     * @return
     */
    public static boolean exist(String name){
        return exist(new File(name));
    }

    public static boolean exist(File file){
        return file.exists();
    }

    /**
     * 创建文件
     * @param file
     * @return
     * @throws IOException
     */
    public static boolean createFile(String file) throws IOException {
        return createFile(new File(file));
    }

    public static boolean createFile(File file) throws IOException {
        if(!file.exists()){
            if(file.isDirectory()){
                return file.mkdirs();
            }else{
                File parentDir = file.getParentFile();
                if(!parentDir.exists()) {
                    if (parentDir.mkdirs()) {
                        return file.createNewFile();
                    }
                }else{
                    return file.createNewFile();
                }
            }
        }
        return true;
    }

    /**
     * 读取文件内容 按行
     * @param file
     * @return
     * @throws IOException
     */
    public static List<String> readLines(String file) throws IOException{
        return readLines(new File(file), "UTF-8");
    }

    public static List<String> readLines(String file, String encording) throws IOException{
        return readLines(new File(file), encording);
    }

    public static List<String> readLines(File file, String encording) throws IOException {

        List<String> lines = null;
        if(FileCommon.exist(file)) {
            FileInputStream fileInputStream = new FileInputStream(file);
            lines = IOUtils.readLines(fileInputStream, encording);
            fileInputStream.close();
        }
        return lines;
    }

    /**
     * 获取文件前缀
     * @param fileName
     * @return
     */
    public static String getPrefix(String fileName){
        String prefix = fileName;
        int pos = fileName.lastIndexOf(".");
        if (pos != -1){
            prefix = fileName.substring(0,pos);
        }
        return prefix;
    }

    /**
     * 获取文件名后缀
     * @param fileName
     * @return
     */
    public static String getFilePostfix(String fileName){
        String filePostfix = fileName.substring(fileName.lastIndexOf(".") + 1);
        return filePostfix.toLowerCase();
    }

    /**
     * 删除文件
     * @param filePath
     * @return
     */
    public static boolean delFile(String filePath) {
        boolean flag = false;
        File file = new File(filePath);
        if (file.isFile() && file.exists()) {
            flag = file.delete();
        }
        return flag;
    }

    /**
     * 移动文件
     * @param oldPath
     * @param newPath
     * @return
     */
    public static boolean mvFile(String oldPath,String newPath){
        boolean flag = false;
        File oldfile = new File(oldPath);
        File newfile = new File(newPath);
        if(oldfile.isFile() && oldfile.exists()){
            if(newfile.exists()){
                delFile(newfile.getAbsolutePath());
            }
            flag = oldfile.renameTo(newfile);
        }
        return flag;
    }

    /**
     * 删除目录
     * @param dir
     * @return
     */
    public static boolean deleteDir(File dir){
        if (dir.isDirectory()) {
            String[] children = dir.list();
            //递归删除目录中的子目录下
            if(children!=null){
                for (int i=0; i<children.length; i++) {
                    boolean success = deleteDir(new File(dir, children[i]));
                    if (!success) {
                        return false;
                    }
                }
            }
        }
        // 目录此时为空,可以删除
        return dir.delete();
    }

    //递归建立目录,解压缩相关类中使用
    public static void mkdirs(File file) {
        File parent = file.getParentFile();
        if (parent != null && (!parent.exists())) {
            parent.mkdirs();
        }
    }

    public static String getJarFilePathByClass(String clazz) throws ClassNotFoundException {
        return getJarFilePathByClass(Class.forName(clazz));
    }

    public static String getJarFileDirByClass(String clazz) throws ClassNotFoundException {
        return getJarFileDirByClass(Class.forName(clazz));
    }

    public static String getJarFilePathByClass(Class<?> clazz){
        return new File(clazz.getProtectionDomain().getCodeSource().getLocation().getFile()).getAbsolutePath();
    }

    public static String getJarFileDirByClass(Class<?> clazz){
        return new File(getJarFilePathByClass(clazz)).getParent();
    }

    public static String getAbstractPath(String abstractPath) throws Exception{
        URL url = FileCommon.class.getClassLoader().getResource(abstractPath);
        System.out.println("配置文件路径为" + url);
        File file = new File(url.getFile());
        String content= FileUtils.readFileToString(file,"UTF-8");
        return content;
    }

    public static String getAbstractPath111(String abstractPath) throws Exception{
        File file = new File(abstractPath);
        String content= FileUtils.readFileToString(file,"UTF-8");
        return content;
    }
}

6、filter—数据过滤顶层接口

package com.hsiehchou.common.filter;

/**
 * 数据过滤顶层接口
 */
public interface Filter<T> {
    boolean filter(T obj);
}

7、net/HttpRequest.java

package com.hsiehchou.common.net;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.net.URLConnection;
import java.net.URLEncoder;
import java.util.Map;

public class HttpRequest {
    private static final Logger LOG = LoggerFactory.getLogger(HttpRequest.class);

    /**
     * 向指定URL发送GET方法的请求
     * @param url  发送请求的URL
     * @param param  请求参数,请求参数应该是 name1=value1&name2=value2 的形式。
     * @return URL  所代表远程资源的响应结果
     */
    public static String sendGet(String url, String param) {
        String result = "";
        BufferedReader in = null;
        try {
            String urlNameString = url + "?" + param;
            URL realUrl = new URL(urlNameString);
            // 打开和URL之间的连接
            URLConnection connection = realUrl.openConnection();
            // 设置通用的请求属性
            connection.setRequestProperty("accept", "*/*");
            connection.setRequestProperty("connection", "Keep-Alive");
            connection.setRequestProperty("user-agent",
                    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");
            // 建立实际的连接
            connection.connect();
            // 获取所有响应头字段
            //Map<String, List<String>> map = connection.getHeaderFields();
            // 遍历所有的响应头字段
            // 定义 BufferedReader输入流来读取URL的响应
            in = new BufferedReader(new InputStreamReader(connection.getInputStream(),"UTF-8"));
            String line;
            while ((line = in.readLine()) != null) {
                result += line;
            }
        } catch (Exception e) {
            LOG.info("发送GET请求出现异常!" + (url+param));
            System.out.println("发送GET请求出现异常!" + e);
            e.printStackTrace();
        }
        // 使用finally块来关闭输入流
        finally {
            try {
                if (in != null) {
                    in.close();
                }
            } catch (Exception e2) {
                e2.printStackTrace();
            }
        }
        return result;
    }

    /**
     * 向指定URL发送GET方法的请求
     * @param url  发送请求的URL
     * @param param  请求参数,请求参数应该是 name1=value1&name2=value2 的形式。
     * @return URL 所代表远程资源的响应结果
     */
    public static String sendGet(String url, String param,String authorization) {
        String result = "";
        BufferedReader in = null;
        try {
            String urlNameString = url + "?" + param;
            URL realUrl = new URL(urlNameString);
            // 打开和URL之间的连接
            URLConnection connection = realUrl.openConnection();
            // 设置通用的请求属性
            connection.setRequestProperty("accept", "*/*");
            connection.setRequestProperty("connection", "Keep-Alive");
            connection.setRequestProperty("Authorization", authorization);
            connection.setRequestProperty("user-agent",
                    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");

            // 建立实际的连接
            connection.connect();
            // 获取所有响应头字段
            connection.getHeaderFields();
            // 遍历所有的响应头字段
/*            for (String key : map.keySet()) {
                System.out.println(key + "--->" + map.get(key));
            }*/
            // 定义 BufferedReader输入流来读取URL的响应
            in = new BufferedReader(new InputStreamReader(
                    connection.getInputStream(),"UTF-8"));
            String line;
            while ((line = in.readLine()) != null) {
                result += line;
            }
        } catch (Exception e) {
            LOG.info("发送POST请求出现异常!" + (url+param));
            System.out.println("发送POST请求出现异常!" + e);
            e.printStackTrace();
        }
        // 使用finally块来关闭输入流
        finally {
            try {
                if (in != null) {
                    in.close();
                }
            } catch (Exception e2) {
                e2.printStackTrace();
            }
        }
        return result;
    }

    public static void main(String[] args) throws Exception{

    }

    /**
     * 向指定 URL 发送POST方法的请求
     * @param url  发送请求的 URL
     * @param param  请求参数,请求参数应该是 name1=value1&name2=value2 的形式。
     * @return  所代表远程资源的响应结果
     */
    public static String sendPost(String url, String param) {
        PrintWriter out = null;
        BufferedReader in = null;
        String result = "";
        try {
            URL realUrl = new URL(url);
            // 打开和URL之间的连接
            URLConnection conn = realUrl.openConnection();
            // 设置通用的请求属性
            conn.setRequestProperty("Content-Type","application/json");
            //conn.setInstanceFollowRedirects(false);
            // conn.setRequestProperty("Content-Type","application/x-www-form-urlencoded");
            conn.setRequestProperty("accept", "*/*");
            conn.setRequestProperty("connection", "Keep-Alive");
            conn.setRequestProperty("user-agent",
                    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");
            // 发送POST请求必须设置如下两行
            conn.setReadTimeout(30000);
            conn.setDoOutput(true);
            conn.setDoInput(true);
            // 获取URLConnection对象对应的输出流
            out = new PrintWriter(conn.getOutputStream());
            // 发送请求参数
            out.print(param);
            // flush输出流的缓冲
            out.flush();
            // 定义BufferedReader输入流来读取URL的响应

            InputStream inputStream = conn.getInputStream();
            in = new BufferedReader(new InputStreamReader(inputStream,"UTF-8"));
            String line;
            while ((line = in.readLine()) != null) {
                result += line;
            }
        }
        catch (IOException e) {
            LOG.info("发送POST请求出现异常!" + (url+param),e);
        }
        //使用finally块来关闭输出流、输入流
        finally{
            try{
                if(out!=null){
                    out.close();
                }
                if(in!=null){
                    in.close();
                }
            }
            catch(IOException ex){
                ex.printStackTrace();
            }
        }
        return result;
    }

    /*
     * params 填写的URL的参数 encode 字节编码
     */
    public static String sendPostMessage(String url1,Map<String,Object> params){
        String response = null;
        Reader in = null;
        try {
            //访问准备
            URL url = new URL(url1);
            //开始访问
            StringBuilder postData = new StringBuilder();
            for (Map.Entry<String,Object> param : params.entrySet()) {
                if (postData.length() != 0) postData.append('&');
                postData.append(URLEncoder.encode(param.getKey(), "UTF-8"));
                postData.append('=');
                postData.append(URLEncoder.encode(String.valueOf(param.getValue()), "UTF-8"));
            }
            byte[] postDataBytes = postData.toString().getBytes("UTF-8");
            URLConnection conn = url.openConnection();
            //URLConnection conn = url.openConnection();
            //conn.setRequestMethod("POST");
            //conn.setInstanceFollowRedirects(false);
            //conn.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
            conn.setRequestProperty("Content-Type", "application/json");
            conn.setRequestProperty("Content-Length", String.valueOf(postDataBytes.length));
            conn.setDoOutput(true);
            conn.getOutputStream().write(postDataBytes);

            in = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));

            StringBuilder sb = new StringBuilder();
            for (int c; (c = in.read()) >= 0;)
                sb.append((char)c);
            response = sb.toString();
           //System.out.println(response);
        } catch (IOException e) {
            LOG.error(null,e);
        }finally {
            if(in != null){
                try {
                    in.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
        return response;
    }

    /**
     * 向指定 URL 发送POST方法的请求
     * @param url  发送请求的 URL
     * @param param  请求参数,请求参数应该是 name1=value1&name2=value2 的形式。
     * @return  所代表远程资源的响应结果
     */
    public static void sendPostWithoutReturn(String url, String param) {
        PrintWriter out = null;
        BufferedReader in = null;
        String result = "";
        try {
            URL realUrl = new URL(url);
            // 打开和URL之间的连接
            HttpURLConnection conn = (HttpURLConnection )realUrl.openConnection();
            // 设置通用的请求属性
            conn.setRequestProperty("Content-Type","application/json");
            conn.setRequestProperty("accept", "*/*");
            conn.setRequestProperty("connection", "Keep-Alive");
            conn.setRequestProperty("user-agent",
                    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");

            //根据需求设置读超时的时间
            conn.setReadTimeout(1000);
            // 发送POST请求必须设置如下两行
            conn.setDoOutput(true);
            conn.setDoInput(true);
            // 获取URLConnection对象对应的输出流
            out = new PrintWriter(conn.getOutputStream());
            // 发送请求参数
            out.print(param);
            // flush输出流的缓冲
            out.flush();
            // 定义BufferedReader输入流来读取URL的响应
            if (conn.getResponseCode() == 200) {
                System.out.println("连接成功,传送数据...");
            } else {
                System.out.println("连接失败,错误代码:"+conn.getResponseCode());
            }
        }
        catch (IOException e) {
            LOG.info("发送POST请求出现异常!" + (url+param),e);
        }
        //使用finally块来关闭输出流、输入流
        finally{
            try{
                if(out!=null){
                    out.close();
                }
                in.close();

            }
            catch(Exception ex){
                ex.printStackTrace();
            }
        }
    }
}

8、netb/db/DBCommon—mysql的连接、关闭基础类

package com.hsiehchou.common.netb.db;

import com.hsiehchou.common.config.ConfigUtil;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.sql.*;
import java.util.Properties;

public class DBCommon {

    private static Logger LOG = LoggerFactory.getLogger(DBCommon.class);
    private static String MYSQL_PATH = "common/mysql.properties";
    private static Properties properties = ConfigUtil.getInstance().getProperties(MYSQL_PATH);

    private static Connection conn ;
    private DBCommon(){}

    public static void main(String[] args) {
        System.out.println(properties);
        Connection xz_bigdata = DBCommon.getConn("test");
        System.out.println(xz_bigdata);
    }

    //TODO  配置文件
    private static final String JDBC_DRIVER = "com.mysql.jdbc.Driver";
    private static final String USER_NAME = properties.getProperty("user");
    private static final String PASSWORD = properties.getProperty("password");
    private static final String IP = properties.getProperty("db_ip");
    private static final String PORT = properties.getProperty("db_port");
    private static final String DB_CONFIG = "?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull&autoReconnect=true&failOverReadOnly=false";

    static {
        try {
            Class.forName(JDBC_DRIVER);
        } catch (ClassNotFoundException e) {
            LOG.error(null, e);
        }
    }

    /**
     * 获取数据库连接
     * @param dbName
     * @return
     */
    public static Connection getConn(String dbName) {
        Connection conn = null;
        String  connstring = "jdbc:mysql://"+IP+":"+PORT+"/"+dbName+DB_CONFIG;
        try {
            conn = DriverManager.getConnection(connstring, USER_NAME, PASSWORD);
        } catch (SQLException e) {
            e.printStackTrace();
            LOG.error(null, e);
        }
        return conn;
    }

    /**
     * @param url eg:"jdbc:oracle:thin:@172.16.1.111:1521:d406"
     * @param driver eg:"oracle.jdbc.driver.OracleDriver"
     * @param user eg:"ucase"
     * @param password eg:"ucase123"
     * @return
     * @throws ClassNotFoundException
     * @throws SQLException
     */
    public static Connection getConn(String url, String driver, String user,
                                     String password) throws ClassNotFoundException, SQLException{
        Class.forName(driver);
        conn = DriverManager.getConnection(url, user, password);
        return  conn;
    }

    public static void close(Connection conn){
        try {
            if( conn != null ){
                conn.close();
            }
        } catch (SQLException e) {
            LOG.error(null,e);
        }
    }

    public static void close(Statement statement){
        try {
            if( statement != null ){
                statement.close();
            }
        } catch (SQLException e) {
            LOG.error(null,e);
        }
    }

    public static void close(Connection conn,PreparedStatement statement){
        try {
            if( conn != null ){
                conn.close();
            }
            if( statement != null ){
                statement.close();
            }
        } catch (SQLException e) {
            LOG.error(null,e);
        }
    }

    public static void close(Connection conn,Statement statement,ResultSet resultSet) throws SQLException{

        if( resultSet != null ){
            resultSet.close();
        }
        if( statement != null ){
            statement.close();
        }
        if( conn != null ){
            conn.close();
        }
    }
}

9、project/datatype/DataTypeProperties.java

package com.hsiehchou.common.project.datatype;

import com.hsiehchou.common.config.ConfigUtil;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.*;

public class DataTypeProperties {
    private static final Logger logger = LoggerFactory.getLogger(DataTypeProperties.class);

    private static final String DATA_PATH = "common/datatype.properties";

    public static Map<String,ArrayList<String>> dataTypeMap = null;

    static {
        Properties properties = ConfigUtil.getInstance().getProperties(DATA_PATH);
        dataTypeMap = new HashMap<>();
        Set<Object> keys = properties.keySet();
        keys.forEach(key->{
            String[] split = properties.getProperty(key.toString()).split(",");
            dataTypeMap.put(key.toString(),new ArrayList<>(Arrays.asList(split)));
        });
    }

    public static void main(String[] args) {
        Map<String, ArrayList<String>> dataTypeMap = DataTypeProperties.dataTypeMap;
        System.out.println(dataTypeMap.toString());
    }
}

10、regex/Validation.java—验证工具类

package com.hsiehchou.common.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * 验证工具类
 */
public class Validation {
    // ------------------常量定义
    /**
     * Email正则表达式=
     * "^([a-z0-9A-Z]+[-|\\.]?)+[a-z0-9A-Z]@([a-z0-9A-Z]+(-[a-z0-9A-Z]+)?\\.)+[a-zA-Z]{2,}$"
     * ;
     */
    // public static final String EMAIL =
    // "^([a-z0-9A-Z]+[-|\\.]?)+[a-z0-9A-Z]@([a-z0-9A-Z]+(-[a-z0-9A-Z]+)?\\.)+[a-zA-Z]{2,}$";;
    public static final String EMAIL = "\\w+(\\.\\w+)*@\\w+(\\.\\w+)+";

    /**
     * 电话号码正则表达式=
     * (^(\d{2,4}[-_-—]?)?\d{3,8}([-_-—]?\d{3,8})?([-_-—]?\d{1,7})?$)|
     * (^0?1[35]\d{9}$)
     */
    public static final String PHONE = "(^(\\d{2,4}[-_-—]?)?\\d{3,8}([-_-—]?\\d{3,8})?([-_-—]?\\d{1,7})?$)|(^0?1[35]\\d{9}$)";

    /**
     * 手机号码正则表达式=^(13[0-9]|15[0-9]|18[0-9])\d{8}$
     */
    public static final String MOBILE = "^((13[0-9])|(14[5-7])|(15[^4])|(17[0-8])|(18[0-9]))\\d{8}$";

    /**
     * Integer正则表达式 ^-?(([1-9]\d*$)|0)
     */
    public static final String INTEGER = "^-?(([1-9]\\d*$)|0)";

    /**
     * 正整数正则表达式 >=0 ^[1-9]\d*|0$
     */
    public static final String INTEGER_NEGATIVE = "^[1-9]\\d*|0$";

    /**
     * 负整数正则表达式 <=0 ^-[1-9]\d*|0$
     */
    public static final String INTEGER_POSITIVE = "^-[1-9]\\d*|0$";

    /**
     * Double正则表达式 ^-?([1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0)$
     */
    public static final String DOUBLE = "^-?([1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*|0?\\.0+|0)$";

    /**
     * 正Double正则表达式 >=0 ^[1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0$ 
     */
    public static final String DOUBLE_NEGATIVE = "^[1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*|0?\\.0+|0$";

    /**
     * 负Double正则表达式 <= 0 ^(-([1-9]\d*\.\d*|0\.\d*[1-9]\d*))|0?\.0+|0$
     */
    public static final String DOUBLE_POSITIVE = "^(-([1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*))|0?\\.0+|0$";

    /**
     * 年龄正则表达式 ^(?:[1-9][0-9]?|1[01][0-9]|120)$ 匹配0-120岁
     */
    public static final String AGE = "^(?:[1-9][0-9]?|1[01][0-9]|120)$";

    /**
     * 邮编正则表达式 [0-9]\d{5}(?!\d) 国内6位邮编
     */
    public static final String CODE = "[0-9]\\d{5}(?!\\d)";

    /**
     * 匹配由数字、26个英文字母或者下划线组成的字符串 ^\w+$
     */
    public static final String STR_ENG_NUM_ = "^\\w+$";

    /**
     * 匹配由数字和26个英文字母组成的字符串 ^[A-Za-z0-9]+$
     */
    public static final String STR_ENG_NUM = "^[A-Za-z0-9]+";

    /**
     * 匹配由26个英文字母组成的字符串 ^[A-Za-z]+$
     */
    public static final String STR_ENG = "^[A-Za-z]+$";

    /**
     * 过滤特殊字符串正则 regEx=
     * "[`~!@#$%^&*()+=|{}':;',\\[\\].<>/?~!@#¥%……&*()——+|{}【】‘;:”“’。,、?]";
     */
    public static final String STR_SPECIAL = "[`~!@#$%^&*()+=|{}':;',\\[\\].<>/?~!@#¥%……&*()——+|{}【】‘;:”“’。,、?]";

    /***
     * 日期正则 支持: YYYY-MM-DD YYYY/MM/DD YYYY_MM_DD YYYYMMDD YYYY.MM.DD的形式
     */
    public static final String DATE_ALL = "((^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(10|12|0?[13578])([-\\/\\._]?)(3[01]|[12][0-9]|0?[1-9])$)"
            + "|(^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(11|0?[469])([-\\/\\._]?)(30|[12][0-9]|0?[1-9])$)"
            + "|(^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(0?2)([-\\/\\._]?)(2[0-8]|1[0-9]|0?[1-9])$)|(^([2468][048]00)([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([3579][26]00)"
            + "([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)"
            + "|(^([1][89][0][48])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([2-9][0-9][0][48])([-\\/\\._]?)"
            + "(0?2)([-\\/\\._]?)(29)$)"
            + "|(^([1][89][2468][048])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([2-9][0-9][2468][048])([-\\/\\._]?)(0?2)"
            + "([-\\/\\._]?)(29)$)|(^([1][89][13579][26])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|"
            + "(^([2-9][0-9][13579][26])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$))";

    /***
     * 日期正则 支持: YYYY-MM-DD
     */
    public static final String DATE_FORMAT1 = "(([0-9]{3}[1-9]|[0-9]{2}[1-9][0-9]{1}|[0-9]{1}[1-9][0-9]{2}|[1-9][0-9]{3})-(((0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01]))|((0[469]|11)-(0[1-9]|[12][0-9]|30))|(02-(0[1-9]|[1][0-9]|2[0-8]))))|((([0-9]{2})(0[48]|[2468][048]|[13579][26])|((0[48]|[2468][048]|[3579][26])00))-02-29)";


    /**
     * URL正则表达式 匹配 http www ftp
     */
    public static final String URL = "^(http|www|ftp|)?(://)?(\\w+(-\\w+)*)(\\.(\\w+(-\\w+)*))*((:\\d+)?)(/(\\w+(-\\w+)*))*(\\.?(\\w)*)(\\?)?"
            + "(((\\w*%)*(\\w*\\?)*(\\w*:)*(\\w*\\+)*(\\w*\\.)*(\\w*&)*(\\w*-)*(\\w*=)*(\\w*%)*(\\w*\\?)*"
            + "(\\w*:)*(\\w*\\+)*(\\w*\\.)*"
            + "(\\w*&)*(\\w*-)*(\\w*=)*)*(\\w*)*)$";

    /**
     * 身份证正则表达式
     */
    public static final String IDCARD = "((11|12|13|14|15|21|22|23|31|32|33|34|35|36|37|41|42|43|44|45|46|50|51|52|53|54|61|62|63|64|65)[0-9]{4})"
            + "(([1|2][0-9]{3}[0|1][0-9][0-3][0-9][0-9]{3}"
            + "[Xx0-9])|([0-9]{2}[0|1][0-9][0-3][0-9][0-9]{3}))";

    /**
     * 机构代码
     */
    public static final String JIGOU_CODE = "^[A-Z0-9]{8}-[A-Z0-9]$";

    /**
     * 匹配数字组成的字符串 ^[0-9]+$
     */
    public static final String STR_NUM = "^[0-9]+$";

    // //------------------验证方法
    /**
     * 判断字段是否为空 符合返回ture
     * @param str
     * @return boolean
     */
    public static synchronized boolean StrisNull(String str) {
        return null == str || str.trim().length() <= 0 ? true : false;
    }

    /**
     * 判断字段是非空 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean StrNotNull(String str) {
        return !StrisNull(str);
    }

    /**
     * 字符串null转空
     * @param str
     * @return boolean
     */
    public static String nulltoStr(String str) {
        return StrisNull(str) ? "" : str;
    }

    /**
     * 字符串null赋值默认值
     * @param str  目标字符串
     * @param defaut  默认值
     * @return  String
     */
    public static String nulltoStr(String str, String defaut) {
        return StrisNull(str) ? defaut : str;
    }

    /**
     * 判断字段是否为Email 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isEmail(String str) {
        return Regular(str, EMAIL);
    }

    /**
     * 判断是否为电话号码 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isPhone(String str) {
        return Regular(str, PHONE);
    }

    /**
     * 判断是否为手机号码 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isMobile(String str) {
        return RegularSJHM(str, MOBILE);
    }

    /**
     * 判断是否为Url 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isUrl(String str) {
        return Regular(str, URL);
    }

    /**
     * 判断字段是否为数字 正负整数 正负浮点数 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isNumber(String str) {
        return Regular(str, DOUBLE);
    }

    /**
     * 判断字段是否为INTEGER 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isInteger(String str) {
        return Regular(str, INTEGER);
    }

    /**
     * 判断字段是否为正整数正则表达式 >=0 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isINTEGER_NEGATIVE(String str) {
        return Regular(str, INTEGER_NEGATIVE);
    }

    /**
     * 判断字段是否为负整数正则表达式 <=0 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isINTEGER_POSITIVE(String str) {
        return Regular(str, INTEGER_POSITIVE);
    }

    /**
     * 判断字段是否为DOUBLE 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isDouble(String str) {
        return Regular(str, DOUBLE);
    }

    /**
     * 判断字段是否为正浮点数正则表达式 >=0 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isDOUBLE_NEGATIVE(String str) {
        return Regular(str, DOUBLE_NEGATIVE);
    }

    /**
     * 判断字段是否为负浮点数正则表达式 <=0 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isDOUBLE_POSITIVE(String str) {
        return Regular(str, DOUBLE_POSITIVE);
    }

    /**
     * 判断字段是否为日期 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isDate(String str) {
        return Regular(str, DATE_ALL);
    }

    /**
     * 验证2010-12-10
     * @param str
     * @return
     */
    public static boolean isDate1(String str) {
        return Regular(str, DATE_FORMAT1);
    }

    /**
     * 判断字段是否为年龄 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isAge(String str) {
        return Regular(str, AGE);
    }

    /**
     * 判断字段是否超长 字串为空返回fasle, 超过长度{leng}返回ture 反之返回false
     * @param str
     * @param leng
     * @return boolean
     */
    public static boolean isLengOut(String str, int leng) {
        return StrisNull(str) ? false : str.trim().length() > leng;
    }

    /**
     * 判断字段是否为身份证 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isIdCard(String str) {
        if (StrisNull(str))
            return false;
        if (str.trim().length() == 15 || str.trim().length() == 18) {
            return Regular(str, IDCARD);
        } else {
            return false;
        }
    }

    /**
     * 判断字段是否为邮编 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isCode(String str) {
        return Regular(str, CODE);
    }

    /**
     * 判断字符串是不是全部是英文字母
     * @param str
     * @return boolean
     */
    public static boolean isEnglish(String str) {
        return Regular(str, STR_ENG);
    }

    /**
     * 判断字符串是不是全部是英文字母+数字
     * @param str
     * @return boolean
     */
    public static boolean isENG_NUM(String str) {
        return Regular(str, STR_ENG_NUM);
    }

    /**
     * 判断字符串是不是全部是英文字母+数字+下划线
     * @param str
     * @return boolean
     */
    public static boolean isENG_NUM_(String str) {
        return Regular(str, STR_ENG_NUM_);
    }

    /**
     * 过滤特殊字符串 返回过滤后的字符串
     * @param str
     * @return boolean
     */
    public static String filterStr(String str) {
        Pattern p = Pattern.compile(STR_SPECIAL);
        Matcher m = p.matcher(str);
        return m.replaceAll("").trim();
    }

    /**
     * 校验机构代码格式
     * @return
     */
    public static boolean isJigouCode(String str) {
        return Regular(str, JIGOU_CODE);
    }

    /**
     * 判断字符串是不是数字组成
     * @param str
     * @return boolean
     */
    public static boolean isSTR_NUM(String str) {
        return Regular(str, STR_NUM);
    }

    /**
     * 匹配是否符合正则表达式pattern 匹配返回true
     * @param str 匹配的字符串
     * @param pattern 匹配模式
     * @return boolean
     */
    private static boolean Regular(String str, String pattern) {
        if (null == str || str.trim().length() <= 0)
            return false;
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(str);
        return m.matches();
    }

    /**
     * 匹配是否符合正则表达式pattern 匹配返回true
     * @param str 匹配的字符串
     * @param pattern 匹配模式
     * @return boolean
     */
    private static boolean RegularSJHM(String str, String pattern) {
        if (null == str || str.trim().length() <= 0){
            return false;
        }
        if(str.contains("+86")){
            str=str.replace("+86","");
        }
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(str);
        return m.matches();
    }

    /**
     * description:匹配yyyyMMddHHmmss格式时间
     * @param time
     * @return boolean 2016-7-19 下午5:13:25 by 
     */
    public static final String yyyyMMddHHmmss = "[0-9]{14}";

    public static boolean isyyyyMMddHHmmss(String time) {
        if (time == null) {
            return false;
        }
        boolean bool = time.matches(yyyyMMddHHmmss);
        return bool;
    }

    /**
     * description:匹配yyyyMMddHHmmss格式时间
     * @param time
     * @return boolean 2016-7-19 下午5:13:25 by 
     */
    public static final String isMac = "^[A-F0-9]{2}(-[A-F0-9]{2}){5}$";

    public static boolean isMac(String mac) {
        if (mac == null) {
            return false;
        }
        boolean bool = mac.matches(isMac);
        return bool;
    }

    /**
     * description:匹配yyyyMMddHHmmss格式时间
     * @param time
     * @return boolean 2016-7-19 下午5:13:25 by 
     */
    public static final String longtime = "[0-9]{10}";

    public static boolean isTimestamp(String timestamp) {
        if (timestamp == null) {
            return false;
        }
        boolean bool = timestamp.matches(longtime);
        return bool;
    }

    /**
     * 判断字段是否为datatype 符合返回ture
     * @param str
     * @return boolean
     */
    public static final String DATATYPE = "^\\d{7}$";
    public static boolean isDATATYPE(String str) {
        return Regular(str, DATATYPE);
    }


    /**
     * 判断字段是否为QQ 符合返回ture
     * @param str
     * @return boolean
     */
    public static final String QQ = "^\\d{5,15}$";
    public static boolean isQQ(String str) {
        return Regular(str, QQ);
    }


    /**
     * 判断字段是否为IMSI 符合返回ture
     * @param str
     * @return boolean
     */
    //public static final String IMSI = "^4600[0,1,2,3,4,5,6,7,9]\\d{10}|(46011|46020)\\d{10}$";
    public static final String IMSI = "^[1-9][0-9][0-9]0[0,1,2,3,4,5,6,7,9]\\d{10}|[1-9][0-9][0-9](11|20)\\d{10}$";
    public static boolean isIMSI(String str) {
        return Regular(str, IMSI);
    }

    /**
     * 判断字段是否为IMEI 符合返回ture
     * @param str
     * @return boolean
     */
    public static final String IMEI = "^\\d{8}$|^[a-fA-F0-9]{14}$|^\\d{15}$";
    public static boolean isIMEI(String str) {return Regular(str, IMEI);}

    /**
     * 判断字段是否为CAPTURETIME 符合返回ture
     * @param str
     * @return boolean
     */


    public static final String CAPTURETIME = "^\\d{10}|(20[0-9][0-9])\\d{10}$";
    public static boolean isCAPTURETIME(String str) {return Regular(str, CAPTURETIME);}

    /**
     * description:检测认证类型
     * @param auth
     * @return boolean
     */
    public static final String AUTH_TYPE = "^\\d{7}$";
    public static boolean isAUTH_TYPE(String str) {return Regular(str, CAPTURETIME);}

    /**
     * description:检测FIRM_CODE
     * @param auth
     * @return boolean
     */
    public static final String FIRM_CODE = "^\\d{9}$";
    public static boolean isFIRM_CODE(String str) {return Regular(str, FIRM_CODE);}

    /**
     * description:检测经度
     * @param auth
     * @return boolean
     */
    public static final String LONGITUDE = "^-?(([1-9]\\d?)|(1[0-7]\\d)|180)(\\.\\d{1,8})?$";

    //public static final String LONGITUDE ="^([-]?(\\d|([1-9]\\d)|(1[0-7]\\d)|(180))(\\.\\d*)\\,[-]?(\\d|([1-8]\\d)|(90))(\\.\\d*))$";
    public static boolean isLONGITUDE(String str) {return Regular(str, LONGITUDE);}

    /**
     * description:检测纬度
     * @param auth
     * @return boolean
     */
    public static final String LATITUDE = "^-?(([1-8]\\d?)|([1-8]\\d)|90)(\\.\\d{1,8})?$";
    public static boolean isLATITUDE(String str) {return Regular(str, LATITUDE);}

    public static void main(String[] args) {
        boolean bool = isLATITUDE("26.0615854");
        System.out.println(bool);
    }
}

11、thread/ThreadPoolManager.java—线程池管理器单例

package com.hsiehchou.common.thread;

import java.io.Serializable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

/**
 *     线程池管理器单例
 *  默认创建   ewCachedThreadPool :创建一个可缓存的线程池
 *  可通过指定线程的数量来创建:newFixedThreadPool  : 创建固定大小的线程池
 */
public class ThreadPoolManager implements Serializable {

    private static final long serialVersionUID = 1465361469484903956L;
    public static final ThreadPoolManager threadPoolManager =  new ThreadPoolManager();

    private static ThreadPoolManager tpm;

    private transient ExecutorService newCachedThreadPool;
    private transient ExecutorService newFixedThreadPool;

    private int poolCapacity;

    private ThreadPoolManager(){
        if( newCachedThreadPool == null )
            newCachedThreadPool = Executors.newCachedThreadPool();
    }

    @Deprecated
    public static ThreadPoolManager getInstance(){
        if( tpm == null ){
            synchronized(ThreadPoolManager.class){
            if( tpm == null )
                tpm =  new ThreadPoolManager();
            }
        }
        return tpm;
    }

    /**
      * 返回 newCachedThreadPool
     */
    public ExecutorService getExecutorService(){
        if( newCachedThreadPool == null ){
            synchronized(ThreadPoolManager.class){
                if( newCachedThreadPool == null )
                    newCachedThreadPool = Executors.newCachedThreadPool();
            }
        }
        return newCachedThreadPool;
    }

    /** 
      * 返回 newFixedThreadPool
     */
    public ExecutorService getExecutorService(int poolCapacity){
        return getExecutorService(poolCapacity, false);
    }

    /**
      * 返回 newFixedThreadPool
     */
    public synchronized ExecutorService getExecutorService(int poolCapacity, boolean closeOld){
        if(newFixedThreadPool == null || (this.poolCapacity != poolCapacity)){
            if(newFixedThreadPool != null && closeOld){
                newFixedThreadPool.shutdown();
            }
            newFixedThreadPool = Executors.newFixedThreadPool(poolCapacity);
            this.poolCapacity = poolCapacity;
        }
        return newFixedThreadPool;
    }
}

12、time/TimeTranstationUtils.java—时间转换工具类

package com.hsiehchou.common.time;

import org.apache.commons.lang.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;

/**
 * Description: 时间转换工具类
 */
public class TimeTranstationUtils {

    private static final Logger logger = LoggerFactory.getLogger(TimeTranstationUtils.class);

/*    private static SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
    private static SimpleDateFormat sdFormatternew = new SimpleDateFormat("yyyyMMddHH");
    private static SimpleDateFormat sdFormatter1 = new SimpleDateFormat("yyyy-MM-dd");
    private static SimpleDateFormat sdFormatter2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    private static SimpleDateFormat sdFormatter3 = new SimpleDateFormat("yyyyMMdd");*/

    private static Date nowTime;

    public static String Date2yyyyMMddHHmmss() {
        SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
        nowTime = new Date(System.currentTimeMillis());
        String time = sdFormatter.format(nowTime);
        return time;
    }

    public static String Date2yyyyMMddHHmmss(long timestamp) {
        SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
        nowTime = new Date(timestamp);
        String time = sdFormatter.format(nowTime);
        return time;
    }

    public static String Date2yyyyMMdd(long timestamp) {
        SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMdd");
        nowTime = new Date(timestamp);
        String time = sdFormatter.format(nowTime);
        return time;
    }

    public static String Date2yyyyMMddHH(String str) {
        SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
        SimpleDateFormat sdFormatternew = new SimpleDateFormat("yyyyMMddHH");
        try {
            nowTime = sdFormatter.parse(str);
        } catch (ParseException e) {
            e.printStackTrace();
        }
        String time = sdFormatternew.format(nowTime);
        return time;
    }

    public static String Date2yyyy_MM_dd() {
        SimpleDateFormat sdFormatter1 = new SimpleDateFormat("yyyy-MM-dd");
        nowTime = new Date(System.currentTimeMillis());
        String time = sdFormatter1.format(nowTime);
        return time;
    }

    public static String Date2yyyy_MM_dd_HH_mm_ss() {
        SimpleDateFormat sdFormatter2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        nowTime = new Date(System.currentTimeMillis());
        String time = sdFormatter2.format(nowTime);
        return time;
    }

    public static String Date2yyyyMMdd() {
        SimpleDateFormat sdFormatter3 = new SimpleDateFormat("yyyyMMdd");
        nowTime = new Date(System.currentTimeMillis());
        String time = sdFormatter3.format(nowTime);
        return time;
    }

    public static String Date2yyyyMMdd(String str) {
        SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
        SimpleDateFormat sdFormatter3 = new SimpleDateFormat("yyyyMMdd");
        try {
            nowTime = sdFormatter.parse(str);
        } catch (ParseException e) {
            e.printStackTrace();
        }
        String time = sdFormatter3.format(nowTime);
        return time;
    }

    public static Long Date2yyyyMMddHHmmssToLong() {
        return System.currentTimeMillis() / 1000;
    }

    public static String long2date(String capturetime){
        SimpleDateFormat sdf= new SimpleDateFormat("yyyyMMdd");
        //前面的lSysTime是秒数,先乘1000得到毫秒数,再转为java.util.Date类型
        Date dt = new Date(Long.valueOf(capturetime) * 1000);
        String sDateTime = sdf.format(dt);  //得到精确到秒的表示:08/31/2006 21:08:00
        return sDateTime;
    }

    public static Long yyyyMMddHHmmssToLong(String time) {
        SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
        if (StringUtils.isBlank(time)) {
            return 0L;
        } else {
            boolean isNum = time.matches("[0-9]+");
            if (isNum) {
                long long1 = 0;
                try {
                    long1 = sdFormatter.parse(time).getTime();
                } catch (ParseException e) {
                    logger.error(time + "时间转换为long错误" + isNum);
                    return 0L;
                }
                return long1 / 1000;
            }
        }
        return 0L;
    }

    public static Date yyyyMMddHHmmssToDate(String time) {
        SimpleDateFormat sdFormatter = new SimpleDateFormat("yyyyMMddHHmmss");
        if (StringUtils.isBlank(time)) {
            return new Date();
        } else {
            boolean isNum = time.matches("[0-9]+");
            if (isNum) {
                Date date = null;
                try {
                    date = sdFormatter.parse(time);
                } catch (ParseException e) {
                    logger.error(time + "时间转换为date错误" + isNum, e);
                    System.out.println(time);
                    System.out.println(isNum);
                    e.printStackTrace();
                }
                return date;
            }
        }
        return new Date();
    }

    public static Date yyyyMMddHHmmssToDate() {
        Date date = null;
        SimpleDateFormat sdFormatter2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        try {
            date = sdFormatter2.parse(Date2yyyy_MM_dd_HH_mm_ss());
        } catch (ParseException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return date;
    }

    public static java.sql.Date strToDate(String strDate) {
        String str = strDate;
        SimpleDateFormat format = new SimpleDateFormat("yyyy-mm-dd");
        Date d = null;
        try {
            d = format.parse(str);
        } catch (Exception e) {
            e.printStackTrace();
        }
        java.sql.Date date = new java.sql.Date(d.getTime());
        return date;
    }

    public static Long str2Long(String str){
        if(!StringUtils.isBlank(str)){
            return Long.valueOf(str);
        }else{
            return 0L;
        }
    }

    public static Double str2Double(String str){
        if(!StringUtils.isBlank(str)){
            return Double.valueOf(str);
        }else{
            return 0.0;
        }
    }

    public static HashMap<String,Object> mapString2Long(Map<String,String> map, String key, HashMap<String,Object> objectMap) {
        String logouttime = map.get(key);
        if (!StringUtils.isBlank(logouttime)) {
            objectMap.put(key, Long.valueOf(logouttime));
        } else {
            objectMap.put(key, 0L);
        }
        return objectMap;
    }

    public static void main(String[] args) throws InterruptedException {
        System.out.println(long2date("1463487992"));
    }
}

四、Resources开发

xz_bigdata_resources结构

xz_bigdata_resources整体结构

注意:这里的resources要选中右键,选择Make Directory as,选择下级的Resources Root,变成Resources配置源文件,项目可以任意调用。

1、resources下面

log4j2.properties

log4j.rootLogger = error,stdout,D,E

log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target = System.out
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern = [%-5p] %d{yyyy-MM-dd HH:mm:ss,SSS} method:%l%n%m%n

log4j.appender.D = org.apache.log4j.DailyRollingFileAppender
log4j.appender.D.File = F://logs/log.log
log4j.appender.D.Append = true
log4j.appender.D.Threshold = DEBUG 
log4j.appender.D.layout = org.apache.log4j.PatternLayout
log4j.appender.D.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss}  [ %t:%r ] - [ %p ]  %m%n

log4j.appender.E = org.apache.log4j.DailyRollingFileAppender
log4j.appender.E.File =F://logs/error.log 
log4j.appender.E.Append = true
log4j.appender.E.Threshold = ERROR 
log4j.appender.E.layout = org.apache.log4j.PatternLayout
log4j.appender.E.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss}  [ %t:%r ] - [ %p ]  %m%n

2、common

datatype.properties

# base = datatype,idcard,name,age,collecttime,imei

# wechat = datatype,wechat,phone,collecttime,imei

wechat = imei,imsi,longitude,latitude,phone_mac,device_mac,device_number,collect_time,username,phone,object_username,send_message,accept_message,message_time

mail = imei,imsi,longitude,latitude,phone_mac,device_mac,device_number,collect_time,send_mail,send_time,accept_mail,accept_time,mail_content,mail_type

qq = imei,imsi,longitude,latitude,phone_mac,device_mac,device_number,collect_time,username,phone,object_username,send_message,accept_message,message_time

mysql.properties

db_ip = 192.168.116.201
db_port = 3306
user = root
password = root

3、es

es_cluster.properties

es.cluster.name=xz_es
es.cluster.nodes = hadoop1,hadoop2,hadoop3
es.cluster.nodes1 = hadoop1
es.cluster.nodes2 = hadoop2
es.cluster.nodes3 = hadoop3

es.cluster.tcp.port = 9300
es.cluster.http.port = 9200

mapping/base.json

{
  "_source": {
    "enabled": true
  },
  "properties": {
    "datatype":{"type": "keyword"},
    "idcard":{"type": "keyword"},
    "name":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "age":{"type": "long"},
    "collecttime":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "imei":{"type": "keyword"}
  }
}

mapping/fieldmapping.properties

tables = wechat,mail,qq

wechat.imei = string
wechat.imsi = string
wechat.longitude = double
wechat.latitude = double
wechat.phone_mac = string
wechat.device_mac = string
wechat.device_number = string
wechat.collect_time = long
wechat.username = string
wechat.phone = string
wechat.object_username = string
wechat.send_message = string
wechat.accept_message = string
wechat.message_time = long
wechat.id = string
wechat.table = string
wechat.filename = string
wechat.absolute_filename  = string


mail.imei = string
mail.imsi = string
mail.longitude = double
mail.latitude = double
mail.phone_mac = string
mail.device_mac = string
mail.device_number = string
mail.collect_time = long
mail.send_mail = string
mail.send_time = long
mail.accept_mail = string
mail.accept_time = long
mail.mail_content = string
mail.mail_type = string
mail.id = string
mail.table = string
mail.filename = string
mail.absolute_filename  = string

qq.imei = string
qq.imsi = string
qq.longitude = double
qq.latitude = double
qq.phone_mac = string
qq.device_mac = string
qq.device_number = string
qq.collect_time = long
qq.username = string
qq.phone = string
qq.object_username = string
qq.send_message = string
qq.accept_message = string
qq.message_time = long
qq.id = string
qq.table = string
qq.filename = string
qq.absolute_filename  = string

mapping/mail.json

{
  "_source": {
    "enabled": true
  },
  "properties": {
    "imei":{"type": "keyword"},
    "imsi":{"type": "keyword"},
    "longitude":{"type": "double"},
    "latitude":{"type": "double"},
    "phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "collect_time":{"type": "long"},
    "send_mail":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "send_time":{"type": "long"},
    "accept_mail":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "accept_time":{"type": "long"},
    "mail_content":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "mail_type":{"type": "keyword"},
     "id":{"type": "keyword"},
    "table":{"type": "keyword"},
    "filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
  }
}

mapping/qq.json

{
  "_source": {
    "enabled": true
  },
  "properties": {
    "imei":{"type": "keyword"},
    "imsi":{"type": "keyword"},
    "longitude":{"type": "double"},
    "latitude":{"type": "double"},
    "phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "collect_time":{"type": "long"},
    "username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "phone":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "object_username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "send_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "accept_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "message_time":{"type": "long"},
    "id":{"type": "keyword"},
    "table":{"type": "keyword"},
    "filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
  }
}

mapping/test.json

{
  "_source": {
    "enabled": true
  },
  "properties": {
    "id":{"type": "keyword"},
    "source":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "target":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "library_id":{"type": "long"},
    "source_sign":{"type": "keyword"},
    "target_sign":{"type": "keyword"},
    "create_time":{"type": "long"},
    "create_user_id":{"type": "keyword"},
    "is_audit":{"type": "long"},
    "is_del":{"type": "long"},
    "last_modify_user_id":{"type": "keyword"},
    "last_modify_time":{"type": "long"},
    "init_version":{"type": "long"},
    "version":{"type": "long"},
    "score":{"type": "keyword"},
    "level":{"type": "keyword"},
    "example":{"type": "keyword"},
    "conflict":{"type": "keyword"},
    "srcLangId":{"type": "long"},
    "srcLangCN":{"type": "keyword"},
    "tarLangId":{"type": "long"},
    "tarLangCN":{"type": "keyword"},
    "docId":{"type": "keyword"},
    "source_simhash":{"type": "keyword"},
    "sentence_id":{"type": "long"},
    "section_id":{"type": "long"},
    "type":{"type": "long"},
    "industry":{"type": "long"},
    "industry_name":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "querycount":{"type": "long"},
    "reviser":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "comment":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
  }
}

mapping/wechat.json

{
  "_source": {
    "enabled": true
  },
  "properties": {
    "imei":{"type": "keyword"},
    "imsi":{"type": "keyword"},
    "longitude":{"type": "double"},
    "latitude":{"type": "double"},
    "phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "collect_time":{"type": "long"},
    "username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "phone":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "object_username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "send_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "accept_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "message_time":{"type": "long"},
    "id":{"type": "keyword"},
    "table":{"type": "keyword"},
    "filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
  }
}

4、flume

datatype.properties

flume-config.properties

#kafka topic
kafkatopic=test100

validation.properties

# 文件名验证开关
FILENAME_VALIDATION=1

# DATATYPE转换开关
DATATYPE_TRANSACTION=1

# 经纬度验证开关
LONGLAIT_VALIDATION=1

# 是否入错误数据到ES
ERROR_ES=1

5、hadoop

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop1:8020</value>
  </property>
  <property>
    <name>fs.trash.interval</name>
    <value>1</value>
  </property>
  <property>
    <name>io.compression.codecs</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec</value>
  </property>
  <property>
    <name>hadoop.security.authentication</name>
    <value>simple</value>
  </property>
  <property>
    <name>hadoop.security.authorization</name>
    <value>false</value>
  </property>
  <property>
    <name>hadoop.rpc.protection</name>
    <value>authentication</value>
  </property>
  <property>
    <name>hadoop.security.auth_to_local</name>
    <value>DEFAULT</value>
  </property>
  <property>
    <name>hadoop.proxyuser.oozie.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.oozie.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.mapred.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.mapred.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.flume.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.flume.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.HTTP.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.HTTP.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hue.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hue.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.httpfs.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.httpfs.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hdfs.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hdfs.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.yarn.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.yarn.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.security.group.mapping</name>
    <value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value>
  </property>
  <property>
    <name>hadoop.security.instrumentation.requires.admin</name>
    <value>false</value>
  </property>
  <property>
    <name>net.topology.script.file.name</name>
    <value>/etc/hadoop/conf.cloudera.yarn/topology.py</value>
  </property>
  <property>
    <name>io.file.buffer.size</name>
    <value>65536</value>
  </property>
  <property>
    <name>hadoop.ssl.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>hadoop.ssl.require.client.cert</name>
    <value>false</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.keystores.factory.class</name>
    <value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.server.conf</name>
    <value>ssl-server.xml</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.client.conf</name>
    <value>ssl-client.xml</value>
    <final>true</final>
  </property>
</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///dfs/nn</value>
  </property>
  <property>
    <name>dfs.namenode.servicerpc-address</name>
    <value>hadoop1:8022</value>
  </property>
  <property>
    <name>dfs.https.address</name>
    <value>hadoop1:50470</value>
  </property>
  <property>
    <name>dfs.https.port</name>
    <value>50470</value>
  </property>
  <property>
    <name>dfs.namenode.http-address</name>
    <value>hadoop1:50070</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.blocksize</name>
    <value>134217728</value>
  </property>
  <property>
    <name>dfs.client.use.datanode.hostname</name>
    <value>false</value>
  </property>
  <property>
    <name>fs.permissions.umask-mode</name>
    <value>022</value>
  </property>
  <property>
    <name>dfs.namenode.acls.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.use.legacy.blockreader</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.read.shortcircuit</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.domain.socket.path</name>
    <value>/var/run/hdfs-sockets/dn</value>
  </property>
  <property>
    <name>dfs.client.read.shortcircuit.skip.checksum</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.domain.socket.data.traffic</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
    <value>true</value>
  </property>
</configuration>

6、hbase

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop1:8020</value>
  </property>
  <property>
    <name>fs.trash.interval</name>
    <value>1</value>
  </property>
  <property>
    <name>io.compression.codecs</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec</value>
  </property>
  <property>
    <name>hadoop.security.authentication</name>
    <value>simple</value>
  </property>
  <property>
    <name>hadoop.security.authorization</name>
    <value>false</value>
  </property>
  <property>
    <name>hadoop.rpc.protection</name>
    <value>authentication</value>
  </property>
  <property>
    <name>hadoop.security.auth_to_local</name>
    <value>DEFAULT</value>
  </property>
  <property>
    <name>hadoop.proxyuser.oozie.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.oozie.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.mapred.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.mapred.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.flume.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.flume.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.HTTP.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.HTTP.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hue.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hue.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.httpfs.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.httpfs.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hdfs.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hdfs.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.yarn.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.yarn.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.security.group.mapping</name>
    <value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value>
  </property>
  <property>
    <name>hadoop.security.instrumentation.requires.admin</name>
    <value>false</value>
  </property>
  <property>
    <name>hadoop.ssl.require.client.cert</name>
    <value>false</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.keystores.factory.class</name>
    <value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.server.conf</name>
    <value>ssl-server.xml</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.client.conf</name>
    <value>ssl-client.xml</value>
    <final>true</final>
  </property>
</configuration>

hbase-server-config.properties

#hbase  开发环境
need.init.hbase=true
# hbase.zookeeper.quorum=hadoop1.ultiwill.com,hadoop2.ultiwill.com,hadoop3.ultiwill.com
hbase.zookeeper.quorum=hadoop1,hadoop2,hadoop3
hbase.zookeeper.property.clientPort=2181
hbase.rpc.timeout=120000
hbase.client.scanner.timeout.period=120000

hbase-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://hadoop1:8020/hbase</value>
  </property>
  <property>
    <name>hbase.replication</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.client.write.buffer</name>
    <value>2097152</value>
  </property>
  <property>
    <name>hbase.client.pause</name>
    <value>100</value>
  </property>
  <property>
    <name>hbase.client.retries.number</name>
    <value>35</value>
  </property>
  <property>
    <name>hbase.client.scanner.caching</name>
    <value>100</value>
  </property>
  <property>
    <name>hbase.client.keyvalue.maxsize</name>
    <value>10485760</value>
  </property>
  <property>
    <name>hbase.ipc.client.allowsInterrupt</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.client.primaryCallTimeout.get</name>
    <value>10</value>
  </property>
  <property>
    <name>hbase.client.primaryCallTimeout.multiget</name>
    <value>10</value>
  </property>
  <property>
    <name>hbase.fs.tmp.dir</name>
    <value>/user/${user.name}/hbase-staging</value>
  </property>
  <property>
    <name>hbase.client.scanner.timeout.period</name>
    <value>60000</value>
  </property>
  <property>
    <name>hbase.coprocessor.region.classes</name>
    <value>org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint</value>
  </property>
  <property>
    <name>hbase.regionserver.thrift.http</name>
    <value>false</value>
  </property>
  <property>
    <name>hbase.thrift.support.proxyuser</name>
    <value>false</value>
  </property>
  <property>
    <name>hbase.rpc.timeout</name>
    <value>60000</value>
  </property>
  <property>
    <name>hbase.snapshot.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.snapshot.master.timeoutMillis</name>
    <value>60000</value>
  </property>
  <property>
    <name>hbase.snapshot.region.timeout</name>
    <value>60000</value>
  </property>
  <property>
    <name>hbase.snapshot.master.timeout.millis</name>
    <value>60000</value>
  </property>
  <property>
    <name>hbase.security.authentication</name>
    <value>simple</value>
  </property>
  <property>
    <name>hbase.rpc.protection</name>
    <value>authentication</value>
  </property>
  <property>
    <name>zookeeper.session.timeout</name>
    <value>60000</value>
  </property>
  <property>
    <name>zookeeper.znode.parent</name>
    <value>/hbase</value>
  </property>
  <property>
    <name>zookeeper.znode.rootserver</name>
    <value>root-region-server</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>hadoop1,hadoop3,hadoop2</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
  </property>
  <property>
    <name>hbase.rest.ssl.enabled</name>
    <value>false</value>
  </property>
</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>dfs.permissions</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///dfs/nn</value>
  </property>
  <property>
    <name>dfs.namenode.servicerpc-address</name>
    <value>hadoop1:8022</value>
  </property>
  <property>
    <name>dfs.https.address</name>
    <value>hadoop1:50470</value>
  </property>
  <property>
    <name>dfs.https.port</name>
    <value>50470</value>
  </property>
  <property>
    <name>dfs.namenode.http-address</name>
    <value>hadoop1:50070</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.blocksize</name>
    <value>134217728</value>
  </property>
  <property>
    <name>dfs.client.use.datanode.hostname</name>
    <value>false</value>
  </property>
  <property>
    <name>fs.permissions.umask-mode</name>
    <value>022</value>
  </property>
  <property>
    <name>dfs.namenode.acls.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.use.legacy.blockreader</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.read.shortcircuit</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.domain.socket.path</name>
    <value>/var/run/hdfs-sockets/dn</value>
  </property>
  <property>
    <name>dfs.client.read.shortcircuit.skip.checksum</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.domain.socket.data.traffic</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
    <value>true</value>
  </property>
</configuration>

7、kafka

kafka-data-push-info

--config                            kafka自动推送数据配置目录
--timeOut                           推送超时时间    默认 15 min  单位为分钟

kafka自动推送数据配置:
data.sources                        数据源列表。  (例如:data.sources =bhdb1,dpxx)  

{source}.source.type                某个数据源的类型。 (数据源分为数据库和文件两大类, 若为数据库 则使用 数据的名称 例如 oracle,mysql,sqlserver等, 否则使用 file)
                                                                                                            例如:bhdb1.source.type=oracle 或者  dpxx.source.type=file
数据源为数据库:
{source}.db.name                    数据库的名称
{source}.db.host                    数据库的ip或者主机名
{source}.db.port                    数据库的访问端口, 若不填写则使用该种数据库的默认端口
{source}.db.user                    用户名
{source}.db.pwd                     密码                                                                 
{source}.push.topic                 推送到topic的全局配置,即该数据库下配置的表没有配置topic的时候,其数据会推送到该topic。   
{source}.push.tables                需要推送数的表列表 
{source}.{table}.push.sql           只推送使用该sql查询到的数据    。       不填则表示推送全部。
{source}.{table}.push.adjusterfactory 对推送的数据进行调整  , 必须为com.bh.d406.bigdata.kafka.producer.DataAdjuster的子类   ,  需要进行调整数据的时候填写
{source}.{table}.push.topic         该表的数据推送到topic名称  , 若不填则使用全局的topic配置

数据源为文件:
{source}.file.dir                   文件目录    (注意:只支持本地目录 )    
{source}.file.encoding              文件编码      (默认UTF-8)
{source}.file.extensions            需要过滤的文件格式列表
{source}.file.data.loaderfactory    文件加载器工厂类   
{source}.file.data.fields           记录的字段列表      与顺序有关
{source}.file.data.spliter          数据的分割符         默认 \t
{source}.file.skip.firstline        是否跳过第一行数据                       false  or true
{source}.file.data.adjusterfactory  数据矫正工厂类
{source}.push.thread.num            读取文件的线程数
{source}.push.batch.size            分批推送数据 , 每批数据大小
{source}.push.topic                 数据推送的目标topic名称
{source}.store.table                存储的表名

kafka-server-config.properties

#################Kafka 全局配置 #######################
# 格式为host1:port1,host2:port2,
# 这是一个broker列表,用于获得元数据(topics,partitions和replicas),建立起来的socket连接用于发送实际数据,
# 这个列表可以是broker的一个子集,或者一个VIP,指向broker的一个子集
# metadata.broker.list=hadoop1:9092,slaver01:9092,slaver02:9092
metadata.broker.list=hadoop1:9092

# zookeeper列表
zk.connect=hadoop1:2181,hadoop2:2181,hadoop3:2181

# 字消息的序列化类,默认是的encoder处理一个byte[],返回一个byte[]
# 默认值为 kafka.serializer.DefaultEncoder
serializer.class=kafka.serializer.StringEncoder

# 用来控制一个produce请求怎样才能算完成,准确的说,是有多少broker必须已经提交数据到log文件,并向leader发送ack,可以设置如下的值:
# 0,意味着producer永远不会等待一个来自broker的ack,这就是0.7版本的行为。这个选项提供了最低的延迟,但是持久化的保证是最弱的,当server挂掉的时候会丢失一些数据。
# 1,意味着在leader replica已经接收到数据后,producer会得到一个ack。这个选项提供了更好的持久性,因为在server确认请求成功处理后,client才会返回。如果刚写到leader上,还没来得及复制leader就挂了,那么消息才可能会丢失。
# -1,意味着在所有的ISR都接收到数据后,producer才得到一个ack。这个选项提供了最好的持久性,只要还有一个replica存活,那么数据就不会丢失。
# 默认值  为 0
request.required.acks=1

# 请求超时时间     默认为 10000
request.timeout.ms=60000

#决定消息是否应在一个后台线程异步发送。
#合法的值为sync,表示异步发送;sync表示同步发送。
#设置为async则允许批量发送请求,这回带来更高的吞吐量,但是client的机器挂了的话会丢失还没有发送的数据。
#默认值为 sync
producer.type=sync

8、redis

redis.properties

redis.hostname = 192.168.116.202
redis.port  = 6379

9、spark

hive_fields_mapping.properties

datatype= base,wechat

#base = datatype,idcard,name,age,collecttime,imei
#wechat = datatype,wechat,phone,collecttime,imei
#============================================================base
base.datatype = string
base.idcard = string
base.name = string
base.age = long
base.collecttime = string
base.imei = string
#============================================================wechat
wechat.datatype = string
wechat.wechat = string
wechat.phone = string
wechat.collecttime = string
wechat.imei = string

relation.properties

#需要关联的字段
relationfield = phone_mac,phone,username,send_mail,imei,imsi

complex_relationfield = card,phone_mac,phone,username,send_mail,imei,imsi

spark-batch-config.properties

# spark 常规 配置   不包括 流式处理的 配置

#################### 全局  #############################
# 在用户没有指定时,用于分布式随机操作(groupByKey,reduceByKey等等)的默认的任务数( shuffle过程中 task的个数 )
# 默认为 8
spark.default.parallelism=16

# Spark用于缓存的内存大小所占用的Java堆的比率。这个不应该大于JVM中老年代所分配的内存大小
# 默认情况下老年代大小是堆大小的2/3,但是你可以通过配置你的老年代的大小,然后再去增加这个比率
# 默认为 0.66
# spark 1.6 后 过期
# spark.storage.memoryFraction=0.66

# 在spark1.6.0版本默认大小为: (“Java Heap” – 300MB) * 0.75
# 例如:如果堆内存大小有4G,将有2847MB的Spark Memory,Spark Memory=(4*1024MB-300)*0.75=2847MB
# 这部分内存会被分成两部分:Storage Memory和Execution Memory
# 而且这两部分的边界由spark.memory.storageFraction参数设定,默认是0.5即50%
# 新的内存管理模型中的优点是,这个边界不是固定的,在内存压力下这个边界是可以移动的
# 如一个区域内存不够用时可以从另一区域借用内存
spark.memory.fraction=0.75
spark.memory.storageFraction=0.5

# 是否要压缩序列化的RDD分区(比如,StorageLevel.MEMORY_ONLY_SER)
# 在消耗一点额外的CPU时间的代价下,可以极大的提高减少空间的使用
# 默认为 false
spark.rdd.compress=true

# The codec used to compress internal data such as RDD partitions,
# broadcast variables and shuffle outputs. By default,
# Spark provides three codecs: lz4, lzf, and snappy. You can also use fully qualified class names to specify the codec,
# e.g.
# 1. org.apache.spark.io.LZ4CompressionCodec, 
# 2. org.apache.spark.io.LZFCompressionCodec, 
# 3. org.apache.spark.io.SnappyCompressionCodec.   default
spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec

# Block size (in bytes) used in Snappy compression,
# in the case when Snappy compression codec is used.
# Lowering this block size will also lower shuffle memory usage when Snappy is used.
# default : 32K
spark.io.compression.snappy.blockSize=32768


# 同时获取每一个分解任务的时候,映射输出文件的最大的尺寸(以兆为单位)。
# 由于对每个输出都需要我们去创建一个缓冲区去接受它,这个属性值代表了对每个分解任务所使用的内存的一个上限值,
# 因此除非你机器内存很大,最好还是配置一下这个值。
# 默认48
spark.reducer.maxSizeInFlight=48

# 这个配置参数仅适用于HashShuffleMananger的实现,同样是为了解决生成过多文件的问题,
# 采用的方式是在不同批次运行的Map任务之间重用Shuffle输出文件,也就是说合并的是不同批次的Map任务的输出数据,
# 但是每个Map任务所需要的文件还是取决于Reduce分区的数量,因此,它并不减少同时打开的输出文件的数量,
# 因此对内存使用量的减少并没有帮助。只是HashShuffleManager里的一个折中的解决方案。
# 默认为false
#spark.shuffle.consolidateFiles=false

#java.io.Externalizable. Java serialization is flexible but often quite slow, and leads to large serialized formats for many classes.
#default java.io.Serializable
#spark.serializer=org.apache.spark.serializer.KryoSerializer

# Speculation是在任务调度的时候,如果没有适合当前本地性要求的任务可供运行,
# 将跑得慢的任务在空闲计算资源上再度调度的行为,这些参数调整这些行为的频率和判断指标,默认是不使用Speculation的
# 默认为false
# 慎用   可能导致数据重复的现象
#spark.speculation=true

# task失败重试次数
# 默认为4
spark.task.maxFailures=8

# Spark 是有任务的黑名单机制的,但是这个配置在官方文档里面并没有写,可以设置下面的参数,
# 比如设置成一分钟之内不要再把任务发到这个 Executor 上了,单位是毫秒。
# spark.scheduler.executorTaskBlacklistTime=60000

# 超过这个时间,可以执行 NODE_LOCAL 的任务
# 默认为 3000
spark.locality.wait.process=1

# 超过这个时间,可以执行 RACK_LOCAL 的任务
# 默认为 3000
spark.locality.wait.node=3 

# 超过这个时间,可以执行 ANY 的任务
# 默认为 3000
spark.locality.wait.rack=1000

#################### yarn  ###########################

# 提交的jar文件  的副本数
# 默认为 3
spark.yarn.submit.file.replication=1

# container中的线程数
# 默认为 25
spark.yarn.containerLauncherMaxThreads=25

# 解决yarn-cluster模式下 对处理  permGen space oom异常很有用
# spark.yarn.am.extraJavaOptions=
# spark.driver.extraJavaOptions=-XX:PermSize=512M -XX:MaxPermSize=1024M

# 对象指针压缩 和 gc日志收集打印
# spark.executor.extraJavaOptions=-XX:PermSize=512M -XX:MaxPermSize=1024M -XX:MaxDirectMemorySize=1536M -XX:+UseCompressedOops -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
# -XX:-UseGCOverheadLimit
# GC默认情况下有一个限制,默认是GC时间不能超过2%的CPU时间,但是如果大量对象创建(在Spark里很容易出现,代码模式就是一个RDD转下一个RDD),
# 就会导致大量的GC时间,从而出现OutOfMemoryError: GC overhead limit exceeded,可以通过设置-XX:-UseGCOverheadLimit关掉它。
# -XX:+UseCompressedOops  可以压缩指针(8字节变成4字节)
spark.executor.extraJavaOptions=-XX:PermSize=512M -XX:MaxPermSize=1024m -XX:+CMSClassUnloadingEnabled -Xmn512m -XX:MaxTenuringThreshold=15 -XX:-UseGCOverheadLimit -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCompressedOops -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log -XX:+HeapDumpOnOutOfMemoryError

# 当shuffle缓存的数据超过此值  强制刷磁盘  单位为 byte
# spark.shuffle.spill.initialMemoryThreshold=671088640

################### AKKA 相关 ##########################

# 在控制面板通信(序列化任务和任务结果)的时候消息尺寸的最大值,单位是MB。
# 如果你需要给驱动器发回大尺寸的结果(比如使用在一个大的数据集上面使用collect()方法),那么你就该增加这个值了。
# 默认为 10
spark.akka.frameSize=1024

# 用于通信的actor线程数量。如果驱动器有很多CPU核心,那么在大集群上可以增大这个值。
# 默认为 4
spark.akka.threads=8

# Spark节点之间通信的超时时间,以秒为单位
# 默认为20s
spark.akka.timeout=120

# exector的堆外内存(不会占用 分配给executor的jvm内存)
# spark.yarn.executor.memoryOverhead=2560

spark-start-config.properties

# Spark 任务 使用java -cp 方式启动的参数配置
#
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/lib/native
spark.yarn.jar=local:/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/spark/lib/spark-assembly.jar
spark.authenticate=false
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/lib/native
spark.yarn.historyServer.address=http://BH-LAN-Virtual-hadoop-9:18088
spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/lib/native
spark.eventLog.enabled=true
spark.dynamicAllocation.schedulerBacklogTimeout=1
SPARK_SUBMIT=true
spark.yarn.config.gatewayPath=/opt/cloudera/parcels
spark.ui.killEnabled=true
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled=true
spark.dynamicAllocation.minExecutors=0
spark.dynamicAllocation.executorIdleTimeout=60
spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..
spark.shuffle.service.port=7337
spark.eventLog.dir=hdfs://nameservice1/user/spark/applicationHistory
spark.dynamicAllocation.enabled=true

#/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/spark/lib/*
#/etc/spark/conf.cloudera.spark_on_yarn/
#/etc/hadoop/conf.cloudera.yarn/

spark.submit.deployMode=client
spark.app.name=default
spark.master=yarn-client
spark.driver.memory=1g
spark.executor.instances=1
spark.executor.memory=4g
spark.executor.cores=2
spark.jars=

spark-streaming-config.properties

# spark  流式处理的 配置

# job的并行度
# 默认为 1
spark.streaming.concurrentJobs=1

# Spark记忆任何元数据(stages生成,任务生成等等)的时间(秒)。周期性清除保证在这个时间之前的元数据会被遗忘。
#当长时间几小时,几天的运行Spark的时候设置这个是很有用的。注意:任何内存中的RDD只要过了这个时间就会被清除掉。
# 默认 disable
spark.cleaner.ttl=3600

# 将不再使用的缓存数据清除
# 默认为false
spark.streaming.unpersist=true

# 从网络中批量接受对象时的持续时间 , 单位  ms。
# 默认为200ms
spark.streaming.blockInterval=200

# 控制Receiver速度  单位 s
# 因为当streaming程序的数据源的数据量突然变大巨大,可能会导致streaming被撑住导致吞吐不过来,所以可以考虑对于最大吞吐做一下限制。
# 默认为 100000
spark.streaming.receiver.maxRate=10000

# kafka每个分区最大的读取速度   单位 s
# 控制kafka读取的量
spark.streaming.kafka.maxRatePerPartition=50

# 读取kafka的分区最新offset的最大尝试次数
# 默认为1
spark.streaming.kafka.maxRetries=5

# 1、为什么引入Backpressure
# 默认情况下,Spark Streaming通过Receiver以生产者生产数据的速率接收数据,计算过程中会出现batch processing time > batch interval的情况,
# 其中batch processing time 为实际计算一个批次花费时间, batch interval为Streaming应用设置的批处理间隔。
# 这意味着Spark Streaming的数据接收速率高于Spark从队列中移除数据的速率,也就是数据处理能力低,在设置间隔内不能完全处理当前接收速率接收的数据。
# 如果这种情况持续过长的时间,会造成数据在内存中堆积,导致Receiver所在Executor内存溢出等问题(如果设置StorageLevel包含disk, 则内存存放不下的数据会溢写至disk, 加大延迟)。
# Spark 1.5以前版本,用户如果要限制Receiver的数据接收速率,可以通过设置静态配制参数“spark.streaming.receiver.maxRate”的值来实现,
# 此举虽然可以通过限制接收速率,来适配当前的处理能力,防止内存溢出,但也会引入其它问题。比如:producer数据生产高于maxRate,当前集群处理能力也高于maxRate,这就会造成资源利用率下降等问题。
# 为了更好的协调数据接收速率与资源处理能力,Spark Streaming 从v1.5开始引入反压机制(back-pressure),通过动态控制数据接收速率来适配集群数据处理能力。
# 2、Backpressure
# Spark Streaming Backpressure:  根据JobScheduler反馈作业的执行信息来动态调整Receiver数据接收率。
# 通过属性“spark.streaming.backpressure.enabled”来控制是否启用backpressure机制,默认值false,即不启用
spark.streaming.backpressure.enabled=true
spark.streaming.backpressure.initialRate=200

datatype/fieldtype.properties

hive/hive-server-config.properties

# hbase  开发环境

hive/hive-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://hadoop1:9083</value>
  </property>
  <property>
    <name>hive.metastore.client.socket.timeout</name>
    <value>300</value>
  </property>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
  </property>
  <property>
    <name>hive.warehouse.subdir.inherit.perms</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.auto.convert.join</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.auto.convert.join.noconditionaltask.size</name>
    <value>20971520</value>
  </property>
  <property>
    <name>hive.optimize.bucketmapjoin.sortedmerge</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.smbjoin.cache.rows</name>
    <value>10000</value>
  </property>
  <property>
    <name>hive.server2.logging.operation.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/hadoop_log/log/hive/operation_logs</value>
  </property>
  <property>
    <name>mapred.reduce.tasks</name>
    <value>-1</value>
  </property>
  <property>
    <name>hive.exec.reducers.bytes.per.reducer</name>
    <value>67108864</value>
  </property>
  <property>
    <name>hive.exec.copyfile.maxsize</name>
    <value>33554432</value>
  </property>
  <property>
    <name>hive.exec.reducers.max</name>
    <value>1099</value>
  </property>
  <property>
    <name>hive.vectorized.groupby.checkinterval</name>
    <value>4096</value>
  </property>
  <property>
    <name>hive.vectorized.groupby.flush.percent</name>
    <value>0.1</value>
  </property>
  <property>
    <name>hive.compute.query.using.stats</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.vectorized.execution.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.vectorized.execution.reduce.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.merge.mapfiles</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.merge.mapredfiles</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.cbo.enable</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.fetch.task.conversion</name>
    <value>minimal</value>
  </property>
  <property>
    <name>hive.fetch.task.conversion.threshold</name>
    <value>268435456</value>
  </property>
  <property>
    <name>hive.limit.pushdown.memory.usage</name>
    <value>0.1</value>
  </property>
  <property>
    <name>hive.merge.sparkfiles</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.merge.smallfiles.avgsize</name>
    <value>16777216</value>
  </property>
  <property>
    <name>hive.merge.size.per.task</name>
    <value>268435456</value>
  </property>
  <property>
    <name>hive.optimize.reducededuplication</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.optimize.reducededuplication.min.reducer</name>
    <value>4</value>
  </property>
  <property>
    <name>hive.map.aggr</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.map.aggr.hash.percentmemory</name>
    <value>0.5</value>
  </property>
  <property>
    <name>hive.optimize.sort.dynamic.partition</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.execution.engine</name>
    <value>mr</value>
  </property>
  <property>
    <name>spark.executor.memory</name>
    <value>1369020825</value>
  </property>
  <property>
    <name>spark.driver.memory</name>
    <value>966367641</value>
  </property>
  <property>
    <name>spark.executor.cores</name>
    <value>1</value>
  </property>
  <property>
    <name>spark.yarn.driver.memoryOverhead</name>
    <value>102</value>
  </property>
  <property>
    <name>spark.yarn.executor.memoryOverhead</name>
    <value>230</value>
  </property>
  <property>
    <name>spark.dynamicAllocation.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>spark.dynamicAllocation.initialExecutors</name>
    <value>1</value>
  </property>
  <property>
    <name>spark.dynamicAllocation.minExecutors</name>
    <value>1</value>
  </property>
  <property>
    <name>spark.dynamicAllocation.maxExecutors</name>
    <value>2147483647</value>
  </property>
  <property>
    <name>hive.metastore.execute.setugi</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.support.concurrency</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.zookeeper.quorum</name>
    <value>hadoop1,hadoop3,hadoop2</value>
  </property>
  <property>
    <name>hive.zookeeper.client.port</name>
    <value>2181</value>
  </property>
  <property>
    <name>hive.zookeeper.namespace</name>
    <value>hive_zookeeper_namespace_hive</value>
  </property>
  <property>
    <name>hive.cluster.delegation.token.store.class</name>
    <value>org.apache.hadoop.hive.thrift.MemoryTokenStore</value>
  </property>
  <property>
    <name>hive.server2.enable.doAs</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.server2.use.SSL</name>
    <value>false</value>
  </property>
  <property>
    <name>spark.shuffle.service.enabled</name>
    <value>true</value>
  </property>
</configuration>

五.Flume开发

xz_bigdata_flume

FTP–>FlumeSource–>拦截器–>FlumeChannel–>FlumeSink–>Kafka

自定义的内容有:FlumeSource、拦截器、FlumeSink

1、maven冲突解决和pom.xml

1.1 安装Maven Helper插件,在Settings里面的Plugins里面搜索Maven Helper,点击Install,安装完毕。

1.2 ETL包括数据的抽取、转换、加载
①数据抽取:从源数据源系统抽取目的数据源系统需要的数据:
②数据转换:将从源数据源获取的数据按照业务需求,转换成目的数据源要求的形式,并对错误、不一致的数据进行清洗和加工;
③数据加载:将转换后的数据装载到目的数据源。

Flume数据处理流程

1.3 pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata2</artifactId>
        <groupId>com.hsiehchou</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_flume</artifactId>

    <name>xz_bigdata_flume</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <flume-ng.version>1.6.0</flume-ng.version>
        <hadoop.version>2.6.0</hadoop.version>
        <jdom.version>1.0</jdom.version>
        <c3p0.version>0.9.5</c3p0.version>
        <hadoop.version>2.6.0</hadoop.version>
        <mybatis.version>3.1.1</mybatis.version>
        <zookeeper.version>3.4.6</zookeeper.version>
        <net.sf.json.version>2.2.3</net.sf.json.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_resources</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_common</artifactId>
            <version>1.0-SNAPSHOT</version>
            <exclusions>
                <exclusion>
                    <artifactId>fastjson</artifactId>
                    <groupId>com.alibaba</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-configuration</artifactId>
                    <groupId>commons-configuration</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-io</artifactId>
                    <groupId>commons-io</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-lang</artifactId>
                    <groupId>commons-lang</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_kafka</artifactId>
            <version>1.0-SNAPSHOT</version>
            <exclusions>
                <exclusion>
                    <artifactId>snappy-java</artifactId>
                    <groupId>org.xerial.snappy</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>scala-library</artifactId>
                    <groupId>org.scala-lang</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>zookeeper</artifactId>
                    <groupId>org.apache.zookeeper</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>slf4j-api</artifactId>
                    <groupId>org.slf4j</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>slf4j-log4j12</artifactId>
                    <groupId>org.slf4j</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>log4j</artifactId>
                    <groupId>log4j</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <!--flume核心依赖-->
        <dependency>
            <groupId>org.apache.flume</groupId>
            <artifactId>flume-ng-core</artifactId>
            <version>${flume-ng.version}-${cdh.version}</version>
            <exclusions>
                <exclusion>
                    <artifactId>slf4j-api</artifactId>
                    <groupId>org.slf4j</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>guava</artifactId>
                    <groupId>com.google.guava</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-codec</artifactId>
                    <groupId>commons-codec</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-logging</artifactId>
                    <groupId>commons-logging</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>jetty</artifactId>
                    <groupId>org.mortbay.jetty</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>jetty-util</artifactId>
                    <groupId>org.mortbay.jetty</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>servlet-api</artifactId>
                    <groupId>org.mortbay.jetty</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-io</artifactId>
                    <groupId>commons-io</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>commons-lang</artifactId>
                    <groupId>commons-lang</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.apache.flume</groupId>
            <artifactId>flume-ng-sdk</artifactId>
            <version>${flume-ng.version}-${cdh.version}</version>
        </dependency>

        <!--flume配置依赖-->
        <dependency>
            <groupId>org.apache.flume</groupId>
            <artifactId>flume-ng-configuration</artifactId>
            <version>${flume-ng.version}-${cdh.version}</version>
            <exclusions>
                <exclusion>
                    <artifactId>guava</artifactId>
                    <groupId>com.google.guava</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>jdom</groupId>
            <artifactId>jdom</artifactId>
            <version>${jdom.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-api</artifactId>
            <version>RELEASE</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>RELEASE</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>RELEASE</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>commons-lang</groupId>
            <artifactId>commons-lang</artifactId>
            <version>RELEASE</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>commons-configuration</groupId>
            <artifactId>commons-configuration</artifactId>
            <version>RELEASE</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>RELEASE</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>RELEASE</version>
            <scope>compile</scope>
        </dependency>
    </dependencies>

    <build>
        <defaultGoal>compile</defaultGoal>
        <sourceDirectory>src/main/java/</sourceDirectory>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <addClasspath>true</addClasspath>
                            <classpathPrefix>jars/</classpathPrefix>
                            <mainClass></mainClass>
                        </manifest>
                    </archive>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <executions>
                    <execution>
                        <id>copy</id>
                        <phase>install</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                        <configuration>
                            <outputDirectory>
                                ${project.build.directory}/jars
                            </outputDirectory>
                            <excludeArtifactIds>javaee-api</excludeArtifactIds>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-resources-plugin</artifactId>
                <version>2.7</version>
                <configuration>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>

2、自定义source

2.1 继承AbstractSource 实现 Configurable, PollableSource接口

package com.hsiehchou.flume.source;

import com.hsiehchou.flume.constant.FlumeConfConstant;
import com.hsiehchou.flume.fields.MapFields;import com.hsiehchou.flume.utils.FileUtilsStronger;
import org.apache.commons.io.FileUtils;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.PollableSource;
import org.apache.flume.channel.ChannelProcessor;
import org.apache.flume.conf.Configurable;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.source.AbstractSource;
import org.apache.log4j.Logger;

import java.io.File;
import java.util.*;

/**
 * 固定写法,自定义Source 直接继承 AbstractSource 和 实现 Configurable, PollableSource 接口
 * 可参照官网 http://flume.apache.org/releases/content/1.9.0/FlumeDeveloperGuide.html#source
 */
public class FolderSource extends AbstractSource implements Configurable, PollableSource {

    private final Logger logger = Logger.getLogger(FolderSource.class);

    //tier1.sources.source1.sleeptime=5
    //tier1.sources.source1.filenum=3000
    //tier1.sources.source1.dirs =/usr/chl/data/filedir/
    //tier1.sources.source1.successfile=/usr/chl/data/filedir_successful/

    //以下为配置在flume.conf文件中
    //读取的文件目录
    private String dirStr;
    //读取的文件目录,如果多个,以","分割,在flume.conf里面配置
    private String[] dirs;
    //处理成功的文件写入的目录
    private String successfile;
    //睡眠时间
    private long sleeptime = 5;
    //每批文件数量
    private int filenum = 500;

    //以下为配置在txtparse.properties文件中
    //读取的所有文件集合
    private Collection<File> allFiles;

    //一批处理的文件大小
    private List<File> listFiles;
    private ArrayList<Event> eventList = new ArrayList<Event>();

    /**
     * @param context 拿到flume配置里面的所有参数
     */
    @Override
    public void configure(Context context) {
        logger.info("开始初始化flume参数");
        initFlumeParams(context);
        logger.info("初始化flume参数成功");
    }

    @Override
    public Status process() {
        //定义处理逻辑
        try {
            Thread.currentThread().sleep(sleeptime * 1000);
        } catch (InterruptedException e) {
            logger.error(null, e);
        }

        Status status = null;
        try {
            // for (String dir : dirs) {
            logger.info("dirStr===========" + dirStr);


            //TODO 1.监控目录下面的所有文件
            //读取目录下的文件,获取目录下所有以 "txt", "bcp" 结尾的文件
            allFiles = FileUtils.listFiles(new File(dirStr), new String[]{"txt", "bcp"}, true);

            //如果目录下文件总数大于阈值,则只取 filenum 个文件进行处理
            if (allFiles.size() >= filenum) {
                //文件数量大于3000 只取3000条
                listFiles = ((List<File>) allFiles).subList(0, filenum);
            } else {
                //文件数量小于3000,取所有文件进行处理
                listFiles = ((List<File>) allFiles);
            }

            //TODO 2.遍历所有的文件进行解析
            if (listFiles.size() > 0) {

                for (File file : listFiles) {
                    //文件名是需要传到channel中的
                    String fileName = file.getName();

                    //解析文件  获取文件名及文件内容 文件绝对路径  文件内容
                    Map<String, Object> stringObjectMap = FileUtilsStronger.parseFile(file, successfile);

                    //返回的内容2个参数  一个是文件绝对路径  另一个是lines文件的所有内容

                    //获取文件绝对路径
                    String absoluteFilename = (String) stringObjectMap.get(MapFields.ABSOLUTE_FILENAME);

                    //获取文件内容
                    List<String> lines = (List<String>) stringObjectMap.get(MapFields.VALUE);

                    //TODO 解析出来之后,需要把解析出来的数据封装为Event
                    if (lines != null && lines.size() > 0) {

                        //遍历读取的内容
                        for (String line : lines) {

                            //封装event Header 将文件名及文件绝对路径通过header传送到channel中
                            //构建event头
                            Map<String, String> map = new HashMap<String, String>();

                            //文件名
                            map.put(MapFields.FILENAME, fileName);

                            //文件绝对路径
                            map.put(MapFields.ABSOLUTE_FILENAME, absoluteFilename);

                            //构建event
                            SimpleEvent event = new SimpleEvent();

                            //把读取的一行数据转成字节
                            byte[] bytes = line.getBytes();
                            event.setBody(bytes);
                            event.setHeaders(map);
                            eventList.add(event);
                        }
                    }

                    try {
                        if (eventList.size() > 0) {
                            //获取channelProcessor
                            ChannelProcessor channelProcessor = getChannelProcessor();

                            //通过channelProcessor把eventList发送出去,可以通过拦截器进行拦截
                            channelProcessor.processEventBatch(eventList);
                            logger.info("批量推送到 拦截器 数据大小为" + eventList.size());
                        }
                        eventList.clear();
                    } catch (Exception e) {
                        eventList.clear();
                        logger.error("发送数据到channel失败", e);
                    } finally {
                        eventList.clear();
                    }
                }
            }
            // 处理成功,返回成功状态
            status = Status.READY;
            return status;
        } catch (Exception e) {
            status = Status.BACKOFF;
            logger.error("异常", e);
            return status;
        }
    }

    /**
     * 初始化flume參數
     * @param context
     */
    public void initFlumeParams(Context context) {

        //读取flume,conf配置文件,初始化参数
        try {
            //文件处理目录

            //监控的文件目录
            dirStr = context.getString(FlumeConfConstant.DIRS);

            //监控多个目录
            dirs = dirStr.split(",");

            //成功处理的文件存放目录
            successfile = context.getString(FlumeConfConstant.SUCCESSFILE);

            //每批处理文件个数
            filenum = context.getInteger(FlumeConfConstant.FILENUM);

            //睡眠时间
            sleeptime = context.getLong(FlumeConfConstant.SLEEPTIME);

            logger.info("dirStr============" + dirStr);
            logger.info("dirs==============" + dirs);
            logger.info("successfile=======" + successfile);
            logger.info("filenum===========" + filenum);
            logger.info("sleeptime=========" + sleeptime);

        } catch (Exception e) {
            logger.error("初始化flume参数失败", e);
        }
    }

    @Override
    public long getBackOffSleepIncrement() {
        return 0;
    }

    @Override
    public long getMaxBackOffSleepInterval() {
        return 0;
    }
}

2.2 实现process()方法
此处代码已经在2.1里面,不用再写了

 public Status process() {
        //定义处理逻辑
        try {
            Thread.currentThread().sleep(sleeptime * 1000);
        } catch (InterruptedException e) {
            logger.error(null, e);
        }

        Status status = null;
        try {
            // for (String dir : dirs) {
            logger.info("dirStr===========" + dirStr);


            //TODO 1.监控目录下面的所有文件
            //读取目录下的文件,获取目录下所有以 "txt", "bcp" 结尾的文件
            allFiles = FileUtils.listFiles(new File(dirStr), new String[]{"txt", "bcp"}, true);

            //如果目录下文件总数大于阈值,则只取 filenum 个文件进行处理
            if (allFiles.size() >= filenum) {
                //文件数量大于3000 只取3000条
                listFiles = ((List<File>) allFiles).subList(0, filenum);
            } else {
                //文件数量小于3000,取所有文件进行处理
                listFiles = ((List<File>) allFiles);
            }

            //TODO 2.遍历所有的文件进行解析
            if (listFiles.size() > 0) {

                for (File file : listFiles) {
                    //文件名是需要传到channel中的
                    String fileName = file.getName();

                    //解析文件  获取文件名及文件内容 文件绝对路径  文件内容
                    Map<String, Object> stringObjectMap = FileUtilsStronger.parseFile(file, successfile);

                    //返回的内容2个参数  一个是文件绝对路径  另一个是lines文件的所有内容

                    //获取文件绝对路径
                    String absoluteFilename = (String) stringObjectMap.get(MapFields.ABSOLUTE_FILENAME);

                    //获取文件内容
                    List<String> lines = (List<String>) stringObjectMap.get(MapFields.VALUE);

                    //TODO 解析出来之后,需要把解析出来的数据封装为Event
                    if (lines != null && lines.size() > 0) {

                        //遍历读取的内容
                        for (String line : lines) {

                            //封装event Header 将文件名及文件绝对路径通过header传送到channel中
                            //构建event头
                            Map<String, String> map = new HashMap<String, String>();

                            //文件名
                            map.put(MapFields.FILENAME, fileName);

                            //文件绝对路径
                            map.put(MapFields.ABSOLUTE_FILENAME, absoluteFilename);

                            //构建event
                            SimpleEvent event = new SimpleEvent();

                            //把读取的一行数据转成字节
                            byte[] bytes = line.getBytes();
                            event.setBody(bytes);
                            event.setHeaders(map);
                            eventList.add(event);
                        }
                    }

                    try {
                        if (eventList.size() > 0) {
                            //获取channelProcessor
                            ChannelProcessor channelProcessor = getChannelProcessor();

                            //通过channelProcessor把eventList发送出去,可以通过拦截器进行拦截
                            channelProcessor.processEventBatch(eventList);
                            logger.info("批量推送到 拦截器 数据大小为" + eventList.size());
                        }
                        eventList.clear();
                    } catch (Exception e) {
                        eventList.clear();
                        logger.error("发送数据到channel失败", e);
                    } finally {
                        eventList.clear();
                    }
                }
            }
            // 处理成功,返回成功状态
            status = Status.READY;
            return status;
        } catch (Exception e) {
            status = Status.BACKOFF;
            logger.error("异常", e);
            return status;
        }
    }

source/MySource.java—Flume官网上的案例

package com.hsiehchou.flume.source;

import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.PollableSource;
import org.apache.flume.conf.Configurable;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.source.AbstractSource;

public class MySource extends AbstractSource implements Configurable, PollableSource {
    private String myProp;

    /**
     * 配置读取
     * @param context
     */
    @Override
    public void configure(Context context) {
        String myProp = context.getString("myProp", "defaultValue");
        // Process the myProp value (e.g. validation, convert to another type, ...)
        // Store myProp for later retrieval by process() method
        this.myProp = myProp;
    }

    /**
     * 定义自己的业务逻辑
     * @return
     * @throws EventDeliveryException
     */
    @Override
    public Status process() throws EventDeliveryException {
        Status status = null;
        try {
            // This try clause includes whatever Channel/Event operations you want to do
            // Receive new data
            //需要把自己的数据封装为event进行传输
            Event e = new SimpleEvent();

            // Store the Event into this Source's associated Channel(s)
            getChannelProcessor().processEvent(e);
            status = Status.READY;
        } catch (Throwable t) {
            // Log exception, handle individual exceptions as needed
            status = Status.BACKOFF;
            // re-throw all Errors
            if (t instanceof Error) {
                throw (Error)t;
            }
        } finally {

        }
        return status;
    }

    @Override
    public long getBackOffSleepIncrement() {
        return 0;
    }

    @Override
    public long getMaxBackOffSleepInterval() {
        return 0;
    }

    @Override
    public void start() {
        // Initialize the connection to the external client
    }

    @Override
    public void stop () {
        // Disconnect from external client and do any additional cleanup
        // (e.g. releasing resources or nulling-out field values) ..
    }
}

3、自定义interceptor—数据清洗过滤器

3.1实现Interceptor 接口

package com.hsiehchou.flume.interceptor;

import com.alibaba.fastjson.JSON;
import com.hsiehchou.flume.fields.MapFields;
import com.hsiehchou.flume.service.DataCheck;
import org.apache.commons.io.Charsets;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.interceptor.Interceptor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;

/**
 * 数据清洗过滤器
 */
public class DataCleanInterceptor implements Interceptor {

    private static final Logger LOG = LoggerFactory.getLogger(DataCleanInterceptor.class);

    //datatpye.properties
    //private static Map<String,ArrayList<String>> dataMap = DataTypeProperties.dataTypeMap;

    /**
     *  初始化
     */
    @Override
    public void initialize() {
    }

    /**
     * 单条处理
     * 拦截方法。数据解析,封装,数据清洗
     * @param event
     * @return
     */
    @Override
    public Event intercept(Event event) {
        SimpleEvent eventNew = new SimpleEvent();
        try {
            LOG.info("拦截器Event开始执行");
            Map<String, String> map = parseEvent(event);
            if(map == null){
                return null;
            }
            String lineJson = JSON.toJSONString(map);
            LOG.info("拦截器推送数据到channel:" +lineJson);
            eventNew.setBody(lineJson.getBytes());
        } catch (Exception e) {
            LOG.error(null,e);
        }
        return eventNew;
    }

    /**
     * 批处理
     * @param events
     * @return
     */
    @Override
    public List<Event> intercept(List<Event> events) {
        List<Event> list = new ArrayList<Event>();
        for (Event event : events) {
            Event intercept = intercept(event);
            if (intercept != null) {
                list.add(intercept);
            }
        }
        return list;
    }

    @Override
    public void close() {
    }

    /**
     * 数据解析
     * @param event
     * @return
     */
    public static Map<String,String> parseEvent(Event event){
        if (event == null) {
            return null;
        }

        //000000000000000    000000000000000    24.000000    25.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305985    andiy    18609765432    judy            1789098763
        String line = new String(event.getBody(), Charsets.UTF_8);

        //文件名 和 文件绝对路径
        String filename = event.getHeaders().get(MapFields.FILENAME);
        String absoluteFilename = event.getHeaders().get(MapFields.ABSOLUTE_FILENAME);

        //String转map,进行数据校验,检验错误入ES错误表
        Map<String, String> map = DataCheck.txtParseAndalidation(line,filename,absoluteFilename);
        return map;

        //wechat_source1_1111115.txt
        //String[] fileNames = filename.split("_");

        // String转map,并进行数据长度校验,校验错误入ES错误表
        //Map<String, String> map = JZDataCheck.txtParse(type, line, source, filename,absoluteFilename);
        //Map<String,String> map = new HashMap<>();

        //000000000000000    000000000000000    24.000000    25.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305985    andiy    18609765432    judy            1789098763
        //String[] split = line.split("\t");

        //数据类别
        //String dataType = fileNames[0];

        //imei,imsi,longitude,latitude,phone_mac,device_mac,device_number,collect_time,username,phone,object_username,send_message,accept_message,message_time
        //ArrayList<String> fields = dataMap.get(dataType);
        //for (int i = 0; i < split.length; i++) {
        //    map.put(fields.get(i),split[i]);
        //}

        //添加ID
        //map.put(MapFields.ID, UUID.randomUUID().toString().replace("-",""));
        // map.put(MapFields.TABLE, dataType);
        // map.put(MapFields.FILENAME, filename);
        // map.put(MapFields.ABSOLUTE_FILENAME, absoluteFilename);

//        Map<String, String> map = DataCheck.txtParseAndalidation(line,filename,absoluteFilename);
//        return map;
    }

    /**
     * 实例化创建
     */
    public static class Builder implements Interceptor.Builder {
        @Override
        public void configure(Context context) {
        }
        @Override
        public Interceptor build() {
            return new DataCleanInterceptor();
        }
    }
}

4、utils工具类

utils/FileUtilsStronger.java

package com.hsiehchou.flume.utils;

import com.hsiehchou.common.time.TimeTranstationUtils;
import com.hsiehchou.flume.fields.MapFields;
import org.apache.commons.io.FileUtils;
import org.apache.log4j.Logger;

import java.io.File;
import java.util.*;

import static java.io.File.separator;

public class FileUtilsStronger {

    private static final Logger logger = Logger.getLogger(FileUtilsStronger.class);

    /**
     * @param file
     * @param path
     */
    public static Map<String,Object> parseFile(File file, String path) {

        Map<String,Object> map=new HashMap<String,Object>();
        List<String> lines;
        String fileNew = path+ TimeTranstationUtils.Date2yyyy_MM_dd()+getDir(file);

        try {
            if((new File(fileNew+file.getName())).exists()){
                try{
                    logger.info("文件名已经存在,开始删除同名已经存在文件"+file.getAbsolutePath());
                    file.delete();
                    logger.info("删除同名已经存在文件"+file.getAbsolutePath()+"成功");
                }catch (Exception e){
                    logger.error("删除同名已经存在文件"+file.getAbsolutePath()+"失败",e);
                }
            }else{
                lines = FileUtils.readLines(file);
                map.put(MapFields.ABSOLUTE_FILENAME,fileNew+file.getName());
                map.put(MapFields.VALUE,lines);
                FileUtils.moveToDirectory(file, new File(fileNew), true);
                logger.info("移动文件到"+file.getAbsolutePath()+"到"+fileNew+"成功");
            }
        } catch (Exception e) {
            logger.error("移动文件" + file.getAbsolutePath() + "到" + fileNew + "失败", e);
        }
        return map;
    }

    /**
     * @param file
     * @param path
     */
    public static List<String> chanmodName(File file, String path) {

        List<String> lines=null;
        try {
            if((new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"/"+file.getName())).exists()){
                logger.warn("文件名已经存在,开始删除同名文件" +path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"/"+file.getName());
                try{
                    file.delete();
                    logger.warn("删除同名文件"+file.getAbsolutePath()+"成功");
                }catch (Exception e){
                    logger.warn("删除同名文件"+file.getAbsolutePath()+"失败",e);
                }
            }else{
                lines = FileUtils.readLines(file);
                FileUtils.moveToDirectory(file, new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()), true);
                logger.info("移动文件到"+file.getAbsolutePath()+"到"+path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"成功");
            }
        } catch (Exception e) {
            logger.error("移动文件" + file.getName() + "到" + path+ TimeTranstationUtils.Date2yyyy_MM_dd() + "失败", e);
        }
        return lines;
    }

    /**
     * @param file
     * @param path
     */
    public static void moveFile2unmanage(File file, String path) {

        try {
            if((new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"/"+file.getName())).exists()){
                logger.warn("文件名已经存在,开始删除同名文件" +file.getAbsolutePath());
                try{
                    file.delete();
                    logger.warn("删除同名文件"+file.getAbsolutePath()+"成功");
                }catch (Exception e){
                    logger.warn("删除同名文件"+file.getAbsolutePath()+"失败",e);
                }
            }else{
                FileUtils.moveToDirectory(file, new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()), true);
                //logger.info("移动文件到"+file.getAbsolutePath()+"到"+path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"成功");
            }
        } catch (Exception e) {
            logger.error("移动错误文件" + file.getName() + "到" + path+ TimeTranstationUtils.Date2yyyy_MM_dd() + "失败", e);
        }
    }

    /**
     * @param file
     * @param path
     */
    public static void shnegtingChanmodName(File file, String path) {
        try {
            if((new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"/"+file.getName())).exists()){
                logger.warn("文件名已经存在,开始删除同名文件" +path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"/"+file.getName());
                try{
                    file.delete();
                    logger.warn("删除同名文件"+file.getAbsolutePath()+"成功");
                }catch (Exception e){
                    logger.warn("删除同名文件"+file.getAbsolutePath()+"失败",e);
                }
            }else{
                FileUtils.moveToDirectory(file, new File(path+ TimeTranstationUtils.Date2yyyy_MM_dd()), true);
                logger.info("移动文件到"+file.getAbsolutePath()+"到"+path+ TimeTranstationUtils.Date2yyyy_MM_dd()+"成功");
            }
        } catch (Exception e) {
            logger.error("移动文件" + file.getName() + "到" + path+ TimeTranstationUtils.Date2yyyy_MM_dd() + "失败", e);
        }
    }

    /**
     * 获取文件父目录
     * @param file
     * @return
     */
    public static String getDir(File file){

        String dir=file.getParent();
        StringTokenizer dirs = new StringTokenizer(dir, separator);
        List<String> list=new ArrayList<String>();
        while(dirs.hasMoreTokens()){
            list.add((String)dirs.nextElement());
        }
        String str="";
        for(int i=2;i<list.size();i++){
            str=str+separator+list.get(i);
        }
        return str+"/";
    }
}

utils/Validation.java—验证工具类

package com.hsiehchou.flume.utils;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * 验证工具类
 */
@Deprecated
public class Validation {
     // ------------------常量定义
    /**
     * Email正则表达式=
     * "^([a-z0-9A-Z]+[-|\\.]?)+[a-z0-9A-Z]@([a-z0-9A-Z]+(-[a-z0-9A-Z]+)?\\.)+[a-zA-Z]{2,}$"
     * ;
     */
    // public static final String EMAIL =
    // "^([a-z0-9A-Z]+[-|\\.]?)+[a-z0-9A-Z]@([a-z0-9A-Z]+(-[a-z0-9A-Z]+)?\\.)+[a-zA-Z]{2,}$";;
    public static final String EMAIL = "\\w+(\\.\\w+)*@\\w+(\\.\\w+)+";

    /**
     * 电话号码正则表达式=
     * (^(\d{2,4}[-_-—]?)?\d{3,8}([-_-—]?\d{3,8})?([-_-—]?\d{1,7})?$)|
     * (^0?1[35]\d{9}$)
     */
    public static final String PHONE = "(^(\\d{2,4}[-_-—]?)?\\d{3,8}([-_-—]?\\d{3,8})?([-_-—]?\\d{1,7})?$)|(^0?1[35]\\d{9}$)";

    /**
     * 手机号码正则表达式=^(13[0-9]|15[0-9]|18[0-9])\d{8}$
     */
    public static final String MOBILE = "^((13[0-9])|(14[5-7])|(15[^4])|(17[0-8])|(18[0-9]))\\d{8}$";

    /**
     * Integer正则表达式 ^-?(([1-9]\d*$)|0)
     */
    public static final String INTEGER = "^-?(([1-9]\\d*$)|0)";

    /**
     * 正整数正则表达式 >=0 ^[1-9]\d*|0$
     */
    public static final String INTEGER_NEGATIVE = "^[1-9]\\d*|0$";

    /**
     * 负整数正则表达式 <=0 ^-[1-9]\d*|0$
     */
    public static final String INTEGER_POSITIVE = "^-[1-9]\\d*|0$";

    /**
     * Double正则表达式 ^-?([1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0)$
     */
    public static final String DOUBLE = "^-?([1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*|0?\\.0+|0)$";

    /**
     * 正Double正则表达式 >=0 ^[1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0$ 
     */
    public static final String DOUBLE_NEGATIVE = "^[1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*|0?\\.0+|0$";

    /**
     * 负Double正则表达式 <= 0 ^(-([1-9]\d*\.\d*|0\.\d*[1-9]\d*))|0?\.0+|0$
     */
    public static final String DOUBLE_POSITIVE = "^(-([1-9]\\d*\\.\\d*|0\\.\\d*[1-9]\\d*))|0?\\.0+|0$";

    /**
     * 年龄正则表达式 ^(?:[1-9][0-9]?|1[01][0-9]|120)$ 匹配0-120岁
     */
    public static final String AGE = "^(?:[1-9][0-9]?|1[01][0-9]|120)$";

    /**
     * 邮编正则表达式 [0-9]\d{5}(?!\d) 国内6位邮编
     */
    public static final String CODE = "[0-9]\\d{5}(?!\\d)";

    /**
     * 匹配由数字、26个英文字母或者下划线组成的字符串 ^\w+$
     */
    public static final String STR_ENG_NUM_ = "^\\w+$";

    /**
     * 匹配由数字和26个英文字母组成的字符串 ^[A-Za-z0-9]+$
     */
    public static final String STR_ENG_NUM = "^[A-Za-z0-9]+";

    /**
     * 匹配由26个英文字母组成的字符串 ^[A-Za-z]+$
     */
    public static final String STR_ENG = "^[A-Za-z]+$";

    /**
     * 过滤特殊字符串正则 regEx=
     * "[`~!@#$%^&*()+=|{}':;',\\[\\].<>/?~!@#¥%……&*()——+|{}【】‘;:”“’。,、?]";
     */
    public static final String STR_SPECIAL = "[`~!@#$%^&*()+=|{}':;',\\[\\].<>/?~!@#¥%……&*()——+|{}【】‘;:”“’。,、?]";

    /***
     * 日期正则 支持: YYYY-MM-DD YYYY/MM/DD YYYY_MM_DD YYYYMMDD YYYY.MM.DD的形式
     */
    public static final String DATE_ALL = "((^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(10|12|0?[13578])([-\\/\\._]?)(3[01]|[12][0-9]|0?[1-9])$)"
            + "|(^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(11|0?[469])([-\\/\\._]?)(30|[12][0-9]|0?[1-9])$)"
            + "|(^((1[8-9]\\d{2})|([2-9]\\d{3}))([-\\/\\._]?)(0?2)([-\\/\\._]?)(2[0-8]|1[0-9]|0?[1-9])$)|(^([2468][048]00)([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([3579][26]00)"
            + "([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)"
            + "|(^([1][89][0][48])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([2-9][0-9][0][48])([-\\/\\._]?)"
            + "(0?2)([-\\/\\._]?)(29)$)"
            + "|(^([1][89][2468][048])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|(^([2-9][0-9][2468][048])([-\\/\\._]?)(0?2)"
            + "([-\\/\\._]?)(29)$)|(^([1][89][13579][26])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$)|"
            + "(^([2-9][0-9][13579][26])([-\\/\\._]?)(0?2)([-\\/\\._]?)(29)$))";

    /***
     * 日期正则 支持: YYYY-MM-DD
     */
    public static final String DATE_FORMAT1 = "(([0-9]{3}[1-9]|[0-9]{2}[1-9][0-9]{1}|[0-9]{1}[1-9][0-9]{2}|[1-9][0-9]{3})-(((0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01]))|((0[469]|11)-(0[1-9]|[12][0-9]|30))|(02-(0[1-9]|[1][0-9]|2[0-8]))))|((([0-9]{2})(0[48]|[2468][048]|[13579][26])|((0[48]|[2468][048]|[3579][26])00))-02-29)";

    /**
     * URL正则表达式 匹配 http www ftp
     */
    public static final String URL = "^(http|www|ftp|)?(://)?(\\w+(-\\w+)*)(\\.(\\w+(-\\w+)*))*((:\\d+)?)(/(\\w+(-\\w+)*))*(\\.?(\\w)*)(\\?)?"
            + "(((\\w*%)*(\\w*\\?)*(\\w*:)*(\\w*\\+)*(\\w*\\.)*(\\w*&)*(\\w*-)*(\\w*=)*(\\w*%)*(\\w*\\?)*"
            + "(\\w*:)*(\\w*\\+)*(\\w*\\.)*"
            + "(\\w*&)*(\\w*-)*(\\w*=)*)*(\\w*)*)$";

    /**
     * 身份证正则表达式
     */
    public static final String IDCARD = "((11|12|13|14|15|21|22|23|31|32|33|34|35|36|37|41|42|43|44|45|46|50|51|52|53|54|61|62|63|64|65)[0-9]{4})"
            + "(([1|2][0-9]{3}[0|1][0-9][0-3][0-9][0-9]{3}"
            + "[Xx0-9])|([0-9]{2}[0|1][0-9][0-3][0-9][0-9]{3}))";

    /**
     * 机构代码
     */
    public static final String JIGOU_CODE = "^[A-Z0-9]{8}-[A-Z0-9]$";

    /**
     * 匹配数字组成的字符串 ^[0-9]+$
     */
    public static final String STR_NUM = "^[0-9]+$";

    // //------------------验证方法
    /**
     * 判断字段是否为空 符合返回ture
     * @param str
     * @return boolean
     */
    public static synchronized boolean StrisNull(String str) {
        return null == str || str.trim().length() <= 0 ? true : false;
    }

    /**
     * 判断字段是非空 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean StrNotNull(String str) {
        return !StrisNull(str);
    }

    /**
     * 字符串null转空
     * @param str
     * @return boolean
     */
    public static String nulltoStr(String str) {
        return StrisNull(str) ? "" : str;
    }

    /**
     * 字符串null赋值默认值
     * @param str 目标字符串
     * @param defaut 默认值
     * @return String
     */
    public static String nulltoStr(String str, String defaut) {
        return StrisNull(str) ? defaut : str;
    }

    /**
     * 判断字段是否为Email 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isEmail(String str) {
        return Regular(str, EMAIL);
    }

    /**
     * 判断是否为电话号码 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isPhone(String str) {
        return Regular(str, PHONE);
    }

    /**
     * 判断是否为手机号码 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isMobile(String str) {
        return RegularSJHM(str, MOBILE);
    }

    /**
     * 判断是否为Url 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isUrl(String str) {
        return Regular(str, URL);
    }

    /**
     * 判断字段是否为数字 正负整数 正负浮点数 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isNumber(String str) {
        return Regular(str, DOUBLE);
    }

    /**
     * 判断字段是否为INTEGER 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isInteger(String str) {
        return Regular(str, INTEGER);
    }

    /**
     * 判断字段是否为正整数正则表达式 >=0 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isINTEGER_NEGATIVE(String str) {
        return Regular(str, INTEGER_NEGATIVE);
    }

    /**
     * 判断字段是否为负整数正则表达式 <=0 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isINTEGER_POSITIVE(String str) {
        return Regular(str, INTEGER_POSITIVE);
    }

    /**
     * 判断字段是否为DOUBLE 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isDouble(String str) {
        return Regular(str, DOUBLE);
    }

    /**
     * 判断字段是否为正浮点数正则表达式 >=0 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isDOUBLE_NEGATIVE(String str) {
        return Regular(str, DOUBLE_NEGATIVE);
    }

    /**
     * 判断字段是否为负浮点数正则表达式 <=0 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isDOUBLE_POSITIVE(String str) {
        return Regular(str, DOUBLE_POSITIVE);
    }

    /**
     * 判断字段是否为日期 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isDate(String str) {
        return Regular(str, DATE_ALL);
    }

    /**
     * 验证
     * @param str
     * @return
     */
    public static boolean isDate1(String str) {
        return Regular(str, DATE_FORMAT1);
    }

    /**
     * 判断字段是否为年龄 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isAge(String str) {
        return Regular(str, AGE);
    }

    /**
     * 判断字段是否超长 字串为空返回fasle, 超过长度{leng}返回ture 反之返回false
     * @param str
     * @param leng
     * @return boolean
     */
    public static boolean isLengOut(String str, int leng) {
        return StrisNull(str) ? false : str.trim().length() > leng;
    }

    /**
     * 判断字段是否为身份证 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isIdCard(String str) {
        if (StrisNull(str))
            return false;
        if (str.trim().length() == 15 || str.trim().length() == 18) {
            return Regular(str, IDCARD);
        } else {
            return false;
        }
    }

    /**
     * 判断字段是否为邮编 符合返回ture
     * @param str
     * @return boolean
     */
    public static boolean isCode(String str) {
        return Regular(str, CODE);
    }

    /**
     * 判断字符串是不是全部是英文字母
     * @param str
     * @return boolean
     */
    public static boolean isEnglish(String str) {
        return Regular(str, STR_ENG);
    }

    /**
     * 判断字符串是不是全部是英文字母+数字
     * @param str
     * @return boolean
     */
    public static boolean isENG_NUM(String str) {
        return Regular(str, STR_ENG_NUM);
    }

    /**
     * 判断字符串是不是全部是英文字母+数字+下划线
     * @param str
     * @return boolean
     */
    public static boolean isENG_NUM_(String str) {
        return Regular(str, STR_ENG_NUM_);
    }

    /**
     * 过滤特殊字符串 返回过滤后的字符串
     * @param str
     * @return boolean
     */
    public static String filterStr(String str) {
        Pattern p = Pattern.compile(STR_SPECIAL);
        Matcher m = p.matcher(str);
        return m.replaceAll("").trim();
    }

    /**
     * 校验机构代码格式
     * @return
     */
    public static boolean isJigouCode(String str) {
        return Regular(str, JIGOU_CODE);
    }

    /**
     * 判断字符串是不是数字组成
     * @param str
     * @return boolean
     */
    public static boolean isSTR_NUM(String str) {
        return Regular(str, STR_NUM);
    }

    /**
     * 匹配是否符合正则表达式pattern 匹配返回true
     * @param str 匹配的字符串
     * @param pattern 匹配模式
     * @return boolean
     */
    private static boolean Regular(String str, String pattern) {
        if (null == str || str.trim().length() <= 0)
            return false;
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(str);
        return m.matches();
    }

    /**
     * 匹配是否符合正则表达式pattern 匹配返回true
     * @param str 匹配的字符串
     * @param pattern 匹配模式
     * @return boolean
     */
    private static boolean RegularSJHM(String str, String pattern) {
        if (null == str || str.trim().length() <= 0){
            return false;
        }
        if(str.contains("+86")){
            str=str.replace("+86","");
        }
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(str);
        return m.matches();
    }

    /**
     * description:匹配yyyyMMddHHmmss格式时间
     * @param time
     * @return boolean
     */
    public static final String yyyyMMddHHmmss = "[0-9]{14}";

    public static boolean isyyyyMMddHHmmss(String time) {
        if (time == null) {
            return false;
        }
        boolean bool = time.matches(yyyyMMddHHmmss);
        return bool;
    }

    /**
     * description:匹配yyyyMMddHHmmss格式时间
     * @param time
     * @return boolean
     */
    public static final String isMac = "^[A-Fa-f0-9]{2}(-[A-Fa-f0-9]{2}){5}$";

    public static boolean isMac(String mac) {
        if (mac == null) {
            return false;
        }
        boolean bool = mac.matches(isMac);
        return bool;
    }

    /**
     * description:匹配yyyyMMddHHmmss格式时间
     * @param time
     * @return boolean
     */
    public static final String longtime = "[0-9]{10}";

    public static boolean isTimestamp(String timestamp) {
        if (timestamp == null) {
            return false;
        }
        boolean bool = timestamp.matches(longtime);
        return bool;
    }

    /**
     * 判断字段是否为datatype 符合返回ture
     * @param str
     * @return boolean
     */
    public static final String DATATYPE = "^\\d{7}$";
    public static boolean isDATATYPE(String str) {
        return Regular(str, DATATYPE);
    }

    /**
     * 判断字段是否为QQ 符合返回ture
     * @param str
     * @return boolean
     */
    public static final String QQ = "^\\d{5,15}$";
    public static boolean isQQ(String str) {
        return Regular(str, QQ);
    }


    /**
     * 判断字段是否为IMSI 符合返回ture
     * @param str
     * @return boolean
     */
    public static final String IMSI = "^4600[0,1,2,3,4,5,6,7,9]\\d{10}|(46011|46020)\\d{10}$";
    public static boolean isIMSI(String str) {
        return Regular(str, IMSI);
    }

    /**
     * 判断字段是否为IMEI 符合返回ture
     * @param str
     * @return boolean
     */
    public static final String IMEI = "^\\d{8}$|^[a-fA-F0-9]{14}$|^\\d{15}$";
    public static boolean isIMEI(String str) {return Regular(str, IMEI);}

    /**
     * 判断字段是否为CAPTURETIME 符合返回ture
     * @param str
     * @return boolean
     */
    public static final String CAPTURETIME = "^\\d{10}|(20[0-9][0-9])\\d{10}$";
    public static boolean isCAPTURETIME(String str) {return Regular(str, CAPTURETIME);}

    /**
     * description:检测认证类型
     * @param auth
     * @return boolean
     */
    public static final String AUTH_TYPE = "^\\d{7}$";
    public static boolean isAUTH_TYPE(String str) {return Regular(str, CAPTURETIME);}


    /**
     * description:检测FIRM_CODE
     * @param auth
     * @return boolean
     */
    public static final String FIRM_CODE = "^\\d{9}$";
    public static boolean isFIRM_CODE(String str) {return Regular(str, FIRM_CODE);}

    /**
     * description:检测经度
     * @param auth
     * @return boolean
     */
    public static final String LONGITUDE = "^-?(([1-9]\\d?)|(1[0-7]\\d)|180)(\\.\\d{1,6})?$";


    //public static final String LONGITUDE ="^([-]?(\\d|([1-9]\\d)|(1[0-7]\\d)|(180))(\\.\\d*)\\,[-]?(\\d|([1-8]\\d)|(90))(\\.\\d*))$";
    public static boolean isLONGITUDE(String str) {return Regular(str, LONGITUDE);}

    /**
     * description:检测纬度
     *
     * @param auth
     * @return boolean 2016-7-19 下午4:50:06 by 
     */
    public static final String LATITUDE = "^-?(([1-8]\\d?)|([1-8]\\d)|90)(\\.\\d{1,6})?$";
    public static boolean isLATITUDE(String str) {return Regular(str, LATITUDE);}

    public static void main(String[] args) {
        boolean bool = isLATITUDE("25.546685");
        System.out.println(bool);
    }
}

5、constant常量

constant/FlumeConfConstant.java

package com.hsiehchou.flume.constant;

public class FlumeConfConstant {

    //flumeSource配置
    public static final String UNMANAGE="unmanage";
    public static final String DIRS="dirs";
    public static final String SUCCESSFILE="successfile";
    public static final String ALL="all";
    public static final String SOURCE="source";
    public static final String FILENUM="filenum";
    public static final String SLEEPTIME="sleeptime";

    //ESSINK配置
    public static final String TIMECELL="timecell";
    public static final String MAXNUM="maxnum";
    public static final String SINK_SOURCE="source";
    public static final String THREADNUM="threadnum";
    public static final String REDISHOST="redishost";
}

constant/TxtConstant.java

package com.hsiehchou.flume.constant;

public class TxtConstant {

    public static final String TYPE_ES="TYPE_ES";

    public static final String STATIONCENTER="STATIONCENTER";
    public static final String APCENTER="APCENTER";
    public static final String IPLOGINLOG="IPLOGINLOG";
    public static final String IMSIIMEI="IMSIIMEI";
    public static final String MACHOUR="MACHOUR";


    public static final String TYPE_SITEMANAGE="TYPE_SITEMANAGE";
    public static final String JZWA="JZWA";


    public static final String FIRMCODE="FIRMCODE";

    public static final String FILENAME_FIELDS1="FILENAME_FIELDS1";

    public static final String FILENAME_FIELDS2="FILENAME_FIELDS2";

    public static final String FILENAME_FIELDS3="FILENAME_FIELDS3";

    public static final String FILENAME_FIELDS4="FILENAME_FIELDS4";

    public static final String FILENAME_FIELDS5="FILENAME_FIELDS5";

    public static final String FILENAME_VALIDATION="FILENAME_VALIDATION";

    public static final String AUTHTYPE_LIST="AUTHTYPE_LIST";

    public static final String SOURCE_FEIJING="SOURCE_FEIJING";
    public static final String SOURCE_650="SOURCE_650";
    public static final String OFFICE_11="OFFICE_11";
    public static final String OFFICE_12="OFFICE_12";
    public static final String WLZK="WLZK";
    public static final String FEIJING="FEIJING";
    public static final String HLWZC="HLWZC";
    public static final String WIFIWL="WIFIWL";

    // 错误索引
    public static final String ERROR_INDEX="es.errorindex";
    public static final String ERROR_TYPE="es.errortype";

    //WIFI索引
    public static final String WIFILOG_INDEX="es.index.wifilog";
    public static final String IPLOGINLOG_TYPE="es.type.iploginlog";
    public static final String EMAIL_TYPE="es.type.email";
    public static final String FTP_TYPE="es.type.ftp";
    public static final String GAME_TYPE="es.type.game";
    public static final String HEARTBEAT_TYPE="es.type.heartbeat";
    public static final String HTTP_TYPE="es.type.http";
    public static final String IMINFO_TYPE="es.type.iminfo";
    public static final String ORGANIZATION_TYPE="es.type.organization";
    public static final String SEARCH_TYPE="es.type.search";
    public static final String IMSIIMEI_TYPE="es.type.imsiimei";
}

6、field字段

field/ErrorMapFields.java

package com.hsiehchou.flume.fields;

public class ErrorMapFields {

    public static final String RKSJ="RKSJ";

    public static final String RECORD="RECORD";

    public static final String LENGTH="LENGTH";
    public static final String LENGTH_ERROR="LENGTH_ERROR";
    public static final String LENGTH_ERROR_NUM="10001";

    public static final String FILENAME="FILENAME";
    public static final String FILENAME_ERROR="FILENAME_ERROR";
    public static final String FILENAME_ERROR_NUM="10010";
    public static final String ABSOLUTE_FILENAME="ABSOLUTE_FILENAME";


    public static final String SJHM="SJHM";
    public static final String SJHM_ERROR="SJHM_ERROR";
    public static final String SJHM_ERRORCODE="10007";


    public static final String DATA_TYPE="DATA_TYPE";
    public static final String DATA_TYPE_ERROR="DATA_TYPE_ERROR";
    public static final String DATA_TYPE_ERRORCODE="10011";

    public static final String QQ="QQ";
    public static final String QQ_ERROR="QQ_ERROR";
    public static final String QQ_ERRORCODE="10002";

    public static final String IMSI="IMSI";
    public static final String IMSI_ERROR="IMSI_ERROR";
    public static final String IMSI_ERRORCODE="10005";

    public static final String IMEI="IMEI";
    public static final String IMEI_ERROR="IMEI_ERROR";
    public static final String IMEI_ERRORCODE="10006";

    public static final String MAC="MAC";
    public static final String CLIENTMAC="CLIENTMAC";
    public static final String STATIONMAC="STATIONMAC";
    public static final String BSSID="BSSID";
    public static final String MAC_ERROR="MAC_ERROR";
    public static final String MAC_ERRORCODE="10003";

    public static final String DEVICENUM="DEVICENUM";
    public static final String DEVICENUM_ERROR="DEVICENUM_ERROR";
    public static final String DEVICENUM_ERRORCODE="10014";

    public static final String CAPTURETIME="CAPTURETIME";
    public static final String CAPTURETIME_ERROR="CAPTURETIME_ERROR";
    public static final String CAPTURETIME_ERRORCODE="10019";

    public static final String EMAIL="EMAIL";
    public static final String EMAIL_ERROR="EMAIL_ERROR";
    public static final String EMAIL_ERRORCODE="10004";

    public static final String AUTH_TYPE="AUTH_TYPE";
    public static final String AUTH_TYPE_ERROR="AUTH_TYPE_ERROR";
    public static final String AUTH_TYPE_ERRORCODE="10020";

    public static final String FIRM_CODE="FIRM_CODE";
    public static final String FIRMCODE_NUM="FIRMCODE_NUM";
    public static final String FIRM_CODE_ERROR="FIRM_CODE_ERROR";
    public static final String FIRM_CODE_ERRORCODE="10009";

    public static final String STARTTIME="STARTTIME";
    public static final String STARTTIME_ERROR="STARTTIME_ERROR";
    public static final String STARTTIME_ERRORCODE="10015";
    public static final String ENDTIME="ENDTIME";
    public static final String ENDTIME_ERROR="ENDTIME_ERROR";
    public static final String ENDTIME_ERRORCODE="10016";

    public static final String LOGINTIME="LOGINTIME";
    public static final String LOGINTIME_ERROR="LOGINTIME_ERROR";
    public static final String LOGINTIME_ERRORCODE="10017";
    public static final String LOGOUTTIME="LOGOUTTIME";
    public static final String LOGOUTTIME_ERROR="LOGOUTTIME_ERROR";
    public static final String LOGOUTTIME_ERRORCODE="10018";

    public static final String LONGITUDE="LONGITUDE";
    public static final String LONGITUDE_ERROR="LONGITUDE_ERROR";
    public static final String LONGITUDE_ERRORCODE="10012";
    public static final String LATITUDE="LATITUDE";
    public static final String LATITUDE_ERROR="LATITUDE_ERROR";
    public static final String LATITUDE_ERRORCODE="10013";

    //TODO 其他类型DATA_TYPE  记录
    public static final String DATA_TYPE_OTHER="DATA_TYPE_OTHER";
    public static final String DATA_TYPE_OTHER_ERROR="DATA_TYPE_OTHER_ERROR";
    public static final String DATA_TYPE_OTHER_ERRORCODE="10022";

    //TODO USERNAME 错误
    public static final String USERNAME="USERNAME";
    public static final String USERNAME_ERROR="USERNAME_ERROR";
    public static final String USERNAME_ERRORCODE="10023";
}

field/MapFields.java

package com.hsiehchou.flume.fields;

public class MapFields {

    public static final String ID="id";
    public static final String SOURCE="source";
    public static final String TYPE="TYPE";
    public static final String TABLE="table";
    public static final String FILENAME="filename";
    public static final String RKSJ="rksj";
    public static final String ABSOLUTE_FILENAME="absolute_filename";
    public static final String BSSID="BSSID";
    public static final String USERNAME="USERNAME";
    public static final String DAYID="DAYID";

    public static final String FIRMCODE_NUM="FIRMCODE_NUM";
    public static final String FIRM_CODE="FIRM_CODE";
    public static final String IMEI="IMEI";
    public static final String IMSI="IMSI";

    public static final String DATA_TYPE_NAME="DATA_TYPE_NAME";

    public static final String AUTH_TYPE="AUTH_TYPE";
    public static final String AUTH_ACCOUNT="AUTH_ACCOUNT";

    //TODO 时间类参数
    public static final String CAPTURETIME="CAPTURETIME";
    public static final String LOGINTIME="LOGINTIME";
    public static final String LOGOUTTIME="LOGOUTTIME";
    public static final String STARTTIME="STARTTIME";
    public static final String ENDTIME="ENDTIME";
    public static final String FIRSTTIME="FIRSTTIME";
    public static final String LASTTIME="LASTTIME";

    //TODO 去重参数
    public static final String COUNT="COUNT";
    public static final String DATA_TYPE="DATA_TYPE";
    public static final String VALUE="value";
    public static final String SITECODE="SITECODE";
    public static final String SITECODENEW="SITECODENEW";

    public static final String DEVICENUM="DEVICENUM";
    public static final String MAC="MAC";
    public static final String CLIENTMAC="CLIENTMAC";
    public static final String STATIONMAC="STATIONMAC";

    public static final String BRAND="BRAND";
    public static final String INDEX="INDEX";
    public static final String ACTION_TYPE="ACTION_TYPE";


    public static final String CITY_CODE="CITY_CODE";
    /* public static final String FILENAME_FIELDS1="FILENAME_FIELDS1";
    public static final String FILENAME_FIELDS1="FILENAME_FIELDS1";
    public static final String FILENAME_FIELDS1="FILENAME_FIELDS1";
    public static final String FILENAME_FIELDS1="FILENAME_FIELDS1";*/

}

7、自定义sink

sink/KafkaSink.java—将数据下沉到kafka

package com.hsiehchou.flume.sink;

import com.google.common.base.Throwables;
import com.hsiehchou.kafka.producer.StringProducer;
import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;
import org.apache.log4j.Logger;

import java.util.ArrayList;
import java.util.List;

public class KafkaSink extends AbstractSink implements Configurable {

    private final Logger logger = Logger.getLogger(KafkaSink.class);
    private String[] kafkatopics = null;
    //private List<KeyedMessage<String,String>> listKeyedMessage=null;
    private List<String> listKeyedMessage=null;
    private Long proTimestamp=System.currentTimeMillis();

    /**
     * 配置读取
     * @param context
     */
    @Override
    public void configure(Context context) {
        //tier1.sinks.sink1.kafkatopic=chl_test7
        //获取 推送kafkatopic参数
        kafkatopics = context.getString("kafkatopics").split(",");
        logger.info("获取kafka topic配置" + context.getString("kafkatopics"));
        listKeyedMessage=new ArrayList<>();
    }

    @Override
    public Status process() throws EventDeliveryException {

        logger.info("sink开始执行");
        Channel channel = getChannel();
        Transaction transaction = channel.getTransaction();
        transaction.begin();
        try {
            //从channel中拿到event
            Event event = channel.take();
            if (event == null) {
                transaction.rollback();
                return Status.BACKOFF;
            }
            // 解析记录 获取事件内容
            String recourd = new String(event.getBody());
            // 发送数据到kafka
            try {
                //调用kafka的消息推送,将数据推送到kafka
                StringProducer.producer(kafkatopics[0],recourd);
            /*    if(listKeyedMessage.size()>1000){
                    logger.info("数据大与10000,推送数据到kafka");
                    sendListKeyedMessage();
                    logger.info("数据大与10000,推送数据到kafka成功");
                }else if(System.currentTimeMillis()-proTimestamp>=60*1000){
                    logger.info("时间间隔大与60,推送数据到kafka");
                    sendListKeyedMessage();
                    logger.info("时间间隔大与60,推送数据到kafka成功"+listKeyedMessage.size());
                }*/
            } catch (Exception e) {
                logger.error("推送数据到kafka失败" , e);
                throw Throwables.propagate(e);
            }
            transaction.commit();
            return Status.READY;
        } catch (ChannelException e) {
            logger.error(e);
            transaction.rollback();
            return Status.BACKOFF;
        } finally {
            if(transaction != null){
                transaction.close();
            }
        }
    }

    @Override
    public synchronized void stop() {
        super.stop();
    }

    /*private void sendListKeyedMessage(){
        Producer<String, String> producer = new Producer<>(KafkaConfig.getInstance().getProducerConfig());
        producer.send(listKeyedMessage);
        listKeyedMessage.clear();
        proTimestamp=System.currentTimeMillis();
        producer.close();
    }*/
}

8、service

DataCheck.java—数据校验

package com.hsiehchou.flume.service;

import com.alibaba.fastjson.JSON;
import com.hsiehchou.common.net.HttpRequest;
import com.hsiehchou.common.project.datatype.DataTypeProperties;
import com.hsiehchou.common.time.TimeTranstationUtils;
import com.hsiehchou.flume.fields.ErrorMapFields;
import com.hsiehchou.flume.fields.MapFields;
import org.apache.log4j.Logger;

import java.util.*;

/**
 * 数据校验
 */
public class DataCheck {

    private final static Logger LOG = Logger.getLogger(DataCheck.class);

    /**
     * 获取数据类型对应的字段  对应的文件
     * 结构为 [ 数据类型1 = [字段1,字段2。。。。],
     * 数据类型2 = [字段1,字段2。。。。]]
     */
    private static Map<String, ArrayList<String>> dataMap = DataTypeProperties.dataTypeMap;

    /**
     * 数据解析
     * @param line
     * @param fileName
     * @param absoluteFilename
     * @return
     */
    public static Map<String, String> txtParse(String line, String fileName, String absoluteFilename) {

        Map<String, String> map = new HashMap<String, String>();
        String[] fileNames = fileName.split("_");
        String dataType = fileNames[0];

        if (dataMap.containsKey(dataType)) {
            List<String> fields = dataMap.get(dataType.toLowerCase());
            String[] splits = line.split("\t");
            //长度校验
            if (fields.size() == splits.length) {
                //添加公共字段
                map.put(MapFields.ID, UUID.randomUUID().toString().replace("-", ""));
                map.put(MapFields.TABLE, dataType.toLowerCase());
                map.put(MapFields.RKSJ, (System.currentTimeMillis() / 1000) + "");
                map.put(MapFields.FILENAME, fileName);
                map.put(MapFields.ABSOLUTE_FILENAME, absoluteFilename);
                for (int i = 0; i < splits.length; i++) {
                    map.put(fields.get(i), splits[i]);
                }
            } else {
                map = null;
                LOG.error("字段长度不匹配fields"+fields.size()  + "/t" + splits.length);
            }
        } else {
            map = null;
            LOG.error("配置文件中不存在此数据类型");
        }
        return map;
    }

    /**
     * 数据长度校验添加必要字段并转map,将长度不符合的插入ES数据库
     * @param line
     * @param fileName
     * @param absoluteFilename
     * @return
     */
    public static Map<String, String> txtParseAndalidation(String line, String fileName, String absoluteFilename) {

        Map<String, String> map = new HashMap<String, String>();
        Map<String, Object> errorMap = new HashMap<String, Object>();

        //文件名按"_"切分  wechat_source1_1111142.txt
        //wechat 数据类型
        //source1 数据来源
        //1111142  不让文件名相同
        String[] fileNames = fileName.split("_");
        String dataType = fileNames[0];
        String source = fileNames[1];

        if (dataMap.containsKey(dataType)) {
            //获取数据类型字段
            // imei,imsi,longitude,latitude,phone_mac,device_mac,device_number,collect_time,username,phone,object_username,send_message,accept_message,message_time
            //根据数据类型,获取改类型的字段
            List<String> fields = dataMap.get(dataType.toLowerCase());
            //line
            String[] splits = line.split("\t");

            //长度校验
            if (fields.size() == splits.length) {
                for (int i = 0; i < splits.length; i++) {
                    map.put(fields.get(i), splits[i]);
                }
                //添加公共字段
                // map.put(SOURCE, source);
                map.put(MapFields.ID, UUID.randomUUID().toString().replace("-", ""));
                map.put(MapFields.TABLE, dataType.toLowerCase());
                map.put(MapFields.RKSJ, (System.currentTimeMillis() / 1000) + "");
                map.put(MapFields.FILENAME, fileName);
                map.put(MapFields.ABSOLUTE_FILENAME, absoluteFilename);

                //数据封装完成  开始进行数据校验
                errorMap = DataValidation.dataValidation(map);
            } else {
                errorMap.put(ErrorMapFields.LENGTH, "字段数不匹配 实际" + fields.size() + "\t" + "结果" + splits.length);
                errorMap.put(ErrorMapFields.LENGTH_ERROR, ErrorMapFields.LENGTH_ERROR_NUM);
                LOG.info("字段数不匹配 实际" + fields.size() + "\t" + "结果" + splits.length);
                map = null;
            }

            //判断数据是否存在错误
            if (null != errorMap && errorMap.size() > 0) {
                LOG.info("errorMap===" + errorMap);
                if ("1".equals("1")) {
                    //addErrorMapES(errorMap, map, fileName, absoluteFilename);
                    //验证没通过,将错误数据写到ES,并将map置空
                    addErrorMapESByHTTP(errorMap, map, fileName, absoluteFilename);
                }
                map = null;
            }
        } else {
            map = null;
            LOG.error("配置文件中不存在此数据类型");
        }
        return map;
    }

    /**
     *  将错误信息写入ES,方便查错
     * @param errorMap
     * @param map
     * @param fileName
     * @param absoluteFilename
     */
    public static void addErrorMapESByHTTP(Map<String, Object> errorMap, Map<String, String> map, String fileName, String absoluteFilename) {

        String errorType = fileName.split("_")[0];
        errorMap.put(MapFields.TABLE, errorType);
        errorMap.put(MapFields.ID, UUID.randomUUID().toString().replace("-", ""));
        errorMap.put(ErrorMapFields.RECORD, map);
        errorMap.put(ErrorMapFields.FILENAME, fileName);
        errorMap.put(ErrorMapFields.ABSOLUTE_FILENAME, absoluteFilename);
        errorMap.put(ErrorMapFields.RKSJ, TimeTranstationUtils.Date2yyyy_MM_dd_HH_mm_ss());
        String url="http://192.168.116.201:9200/error_recourd/error_recourd/"+ errorMap.get(MapFields.ID).toString();
        String json = JSON.toJSONString(errorMap);
        HttpRequest.sendPost(url,json);
        //HttpRequest.sendPostMessage(url, errorMap);
    }

    /*
    public static void addErrorMapES(Map<String, Object> errorMap, Map<String, String> map, String fileName, String absoluteFilename) {

        String errorType = fileName.split("_")[0];
        errorMap.put(MapFields.TABLE, errorType);
        errorMap.put(MapFields.ID, UUID.randomUUID().toString().replace("-", ""));
        errorMap.put(ErrorMapFields.RECORD, map);
        errorMap.put(ErrorMapFields.FILENAME, fileName);
        errorMap.put(ErrorMapFields.ABSOLUTE_FILENAME, absoluteFilename);
        errorMap.put(ErrorMapFields.RKSJ, TimeTranstationUtils.Date2yyyy_MM_dd_HH_mm_ss());


        TransportClient client = null;
        try {
            LOG.info("开始获取客户端===============================" + errorMap);
            client = ESClientUtils.getClient();
        } catch (Throwable t) {
            if (t instanceof Error) {
                throw (Error)t;
            }
            LOG.error(null,t);
        }
        //JestClient jestClient = JestService.getJestClient();
        //boolean bool = JestService.indexOne(jestClient,TxtConstant.ERROR_INDEX, TxtConstant.ERROR_TYPE,errorMap.get(MapFields.ID).toString(),errorMap);
        LOG.info("开始写入错误数据到ES===============================" + errorMap);
        boolean bool = IndexUtil.putIndexData(TxtConstant.ERROR_INDEX, TxtConstant.ERROR_TYPE, errorMap.get(MapFields.ID).toString(), errorMap,client);
        if(bool){
            LOG.info("写入错误数据到ES===============================" + errorMap);
        }else{
            LOG.info("写入错误数据到ES===============================失败");
        }

    }*/

    public static void main(String[] args) {

    }
}

DataValidation.java

package com.hsiehchou.flume.service;

import com.hsiehchou.flume.fields.ErrorMapFields;
import com.hsiehchou.flume.fields.MapFields;
import com.hsiehchou.flume.utils.Validation;
import org.apache.commons.lang.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class DataValidation {
    private static final Logger LOG = LoggerFactory.getLogger(DataValidation.class);

   //  private static final TxtConfigurationFileReader reader = TxtConfigurationFileReader.getInstance();
   //  private static final DataTypeConfigurationFileReader datatypereader = DataTypeConfigurationFileReader.getInstance();
   //  private static final ValidationConfigurationFileReader readerValidation = ValidationConfigurationFileReader.getInstance();

    private static Map<String,String>  dataTypeMap;
    private static List<String> listAuthType;
    private static String isErrorES;
    private static final String USERNAME=ErrorMapFields.USERNAME;

    private static final String DATA_TYPE=ErrorMapFields.DATA_TYPE;
    private static final String DATA_TYPE_ERROR=ErrorMapFields.DATA_TYPE_ERROR;
    private static final String DATA_TYPE_ERRORCODE=ErrorMapFields.DATA_TYPE_ERRORCODE;

    private static final String SJHM=ErrorMapFields.SJHM;
    private static final String SJHM_ERROR=ErrorMapFields.SJHM_ERROR;
    private static final String SJHM_ERRORCODE=ErrorMapFields.SJHM_ERRORCODE;

    private static final String QQ=ErrorMapFields.QQ;
    private static final String QQ_ERROR=ErrorMapFields.QQ_ERROR;
    private static final String QQ_ERRORCODE=ErrorMapFields.QQ_ERRORCODE;

    private static final String IMSI=ErrorMapFields.IMSI;
    private static final String IMSI_ERROR=ErrorMapFields.IMSI_ERROR;
    private static final String IMSI_ERRORCODE=ErrorMapFields.IMSI_ERRORCODE;

    private static final String IMEI=ErrorMapFields.IMEI;
    private static final String IMEI_ERROR=ErrorMapFields.IMEI_ERROR;
    private static final String IMEI_ERRORCODE=ErrorMapFields.IMEI_ERRORCODE;

    private static final String MAC=ErrorMapFields.MAC;
    private static final String CLIENTMAC=ErrorMapFields.CLIENTMAC;
    private static final String STATIONMAC=ErrorMapFields.STATIONMAC;
    private static final String BSSID=ErrorMapFields.BSSID;
    private static final String MAC_ERROR=ErrorMapFields.MAC_ERROR;
    private static final String MAC_ERRORCODE=ErrorMapFields.MAC_ERRORCODE;

    private static final String DEVICENUM=ErrorMapFields.DEVICENUM;
    private static final String DEVICENUM_ERROR=ErrorMapFields.DEVICENUM_ERROR;
    private static final String DEVICENUM_ERRORCODE=ErrorMapFields.DEVICENUM_ERRORCODE;

    private static final String CAPTURETIME=ErrorMapFields.CAPTURETIME;
    private static final String CAPTURETIME_ERROR=ErrorMapFields.CAPTURETIME_ERROR;
    private static final String CAPTURETIME_ERRORCODE=ErrorMapFields.CAPTURETIME_ERRORCODE;

    private static final String EMAIL=ErrorMapFields.EMAIL;
    private static final String EMAIL_ERROR=ErrorMapFields.EMAIL_ERROR;
    private static final String EMAIL_ERRORCODE=ErrorMapFields.EMAIL_ERRORCODE;

    private static final String AUTH_TYPE=ErrorMapFields.AUTH_TYPE;
    private static final String AUTH_TYPE_ERROR=ErrorMapFields.AUTH_TYPE_ERROR;
    private static final String AUTH_TYPE_ERRORCODE=ErrorMapFields.AUTH_TYPE_ERRORCODE;

    private static final String FIRM_CODE=ErrorMapFields.FIRM_CODE;
    private static final String FIRM_CODE_ERROR=ErrorMapFields.FIRM_CODE_ERROR;
    private static final String FIRM_CODE_ERRORCODE=ErrorMapFields.FIRM_CODE_ERRORCODE;

    private static final String STARTTIME=ErrorMapFields.STARTTIME;
    private static final String STARTTIME_ERROR=ErrorMapFields.STARTTIME_ERROR;
    private static final String STARTTIME_ERRORCODE=ErrorMapFields.STARTTIME_ERRORCODE;
    private static final String ENDTIME=ErrorMapFields.ENDTIME;
    private static final String ENDTIME_ERROR=ErrorMapFields.ENDTIME_ERROR;
    private static final String ENDTIME_ERRORCODE=ErrorMapFields.ENDTIME_ERRORCODE;

    private static final String LOGINTIME=ErrorMapFields.LOGINTIME;
    private static final String LOGINTIME_ERROR=ErrorMapFields.LOGINTIME_ERROR;
    private static final String LOGINTIME_ERRORCODE=ErrorMapFields.LOGINTIME_ERRORCODE;
    private static final String LOGOUTTIME=ErrorMapFields.LOGOUTTIME;
    private static final String LOGOUTTIME_ERROR=ErrorMapFields.LOGOUTTIME_ERROR;
    private static final String LOGOUTTIME_ERRORCODE=ErrorMapFields.LOGOUTTIME_ERRORCODE;

    public static Map<String, Object> dataValidation( Map<String, String> map){
        if(map == null){
            return null;
        }

        Map<String,Object> errorMap = new HashMap<String,Object>();
        //验证手机号码
        sjhmValidation(map,errorMap);
        //验证MAC
        macValidation(map,errorMap);
        //验证经纬度
        longlaitValidation(map,errorMap);

        //定义自己的清洗规则

        //TODO 大小写统一
        //TODO 时间类型统一
        //TODO 数据字段统一
        //TODO 业务字段转换
        //TODO 数据矫正
        //TODO 验证MAC不能为空
        //TODO 验证IMSI不能为空
        //TODO 验证 QQ IMSI IMEI
        //TODO 验证DEVICENUM是否为空 为空返回错误
        /*devicenumValidation(map,errorMap);
        //TODO 验证CAPTURETIME是否为空 为空过滤   不为10,14位数字过滤
        capturetimeValidation(map,errorMap);
        //TODO 验证EMAIL
        emailValidation(map,errorMap);
        //TODO 验证STARTTIME ENDTIME LOGINTIME LOGOUTTIME
        timeValidation(map,errorMap);
        */
        return errorMap;
    }

    /**
     * 手机号码验证
     * @param map
     * @param errorMap
     */
    public static void sjhmValidation(Map<String, String> map,Map<String,Object> errorMap){
        if(map.containsKey("phone")){
            String sjhm=map.get("phone");
            //调用正则做手机号码验证,是否是正确的一个,检验
            boolean ismobile = Validation.isMobile(sjhm);
            if(!ismobile){
                errorMap.put(SJHM,sjhm);
                errorMap.put(SJHM_ERROR,SJHM_ERRORCODE);
            }
        }
    }

    //TODO QQ验证  10002  QQ编码 1030001    需要根据DATATYPE来判断数据类型的一起验证
    public static void virtualValidation(String dataType, Map<String, String> map,Map<String,Object> errorMap){

        //TODO USERNAME验证  10023  长度》=2
        if(map.containsKey(ErrorMapFields.USERNAME)){
            String username=map.get(ErrorMapFields.USERNAME);
            if(StringUtils.isNotBlank(username)){
                if(username.length()<2){
                    errorMap.put(ErrorMapFields.USERNAME,username);
                    errorMap.put(ErrorMapFields.USERNAME_ERROR,ErrorMapFields.USERNAME_ERRORCODE);
                }
            }
        }

        //TODO QQ验证  10002  QQ编码 1030001
        if("1030001".equals(dataType)&& map.containsKey(USERNAME)){
            String qqnum= map.get(USERNAME);
            boolean bool = Validation.isQQ(qqnum);
            if(!bool){
                errorMap.put(QQ,qqnum);
                errorMap.put(QQ_ERROR,QQ_ERRORCODE);
            }
        }

        //TODO IMSI验证  10005  IMSI编码 1429997
        if("1429997".equals(dataType)&& map.containsKey(IMSI)){
            String imsi= map.get(IMSI);
            boolean bool = Validation.isIMSI(imsi);
            if(!bool){
                errorMap.put(IMSI,imsi);
                errorMap.put(IMSI_ERROR,IMSI_ERRORCODE);
            }
        }

        //TODO IMEI验证  10006  IMEI编码 1429998
        if("1429998".equals(dataType)&& map.containsKey(IMEI)){
            String imei= map.get(IMEI);
            boolean bool = Validation.isIMEI(imei);
            if(!bool){
                errorMap.put(IMEI,imei);
                errorMap.put(IMEI_ERROR,IMEI_ERRORCODE);
            }
        }
    }

    //MAC验证  10003
    public static void macValidation( Map<String, String> map,Map<String,Object> errorMap){
        if(map == null){
            return ;
        }
        if(map.containsKey("phone_mac")){
            String mac=map.get("phone_mac");
            if(StringUtils.isNotBlank(mac)){
                boolean bool = Validation.isMac(mac);
                if(!bool){
                    LOG.info("MAC验证失败");
                    errorMap.put(MAC,mac);
                    errorMap.put(MAC_ERROR,MAC_ERRORCODE);
                }
            }else{
                LOG.info("MAC验证失败");
                errorMap.put(MAC,mac);
                errorMap.put(MAC_ERROR,MAC_ERRORCODE);
            }
        }
    }

    /**
     * TODO DEVICENUM 验证 为空过滤
     * @param map
     * @param errorMap
     */
    public static void devicenumValidation( Map<String, String> map,Map<String,Object> errorMap){
        if(map == null){
            return ;
        }
        if(map.containsKey("device_number")){
            String devicenum=map.get("device_number");
            if(StringUtils.isBlank(devicenum)){
                errorMap.put(DEVICENUM,"设备编码不能为空");
                errorMap.put(DEVICENUM_ERROR,DEVICENUM_ERRORCODE);
            }
        }
    }

    /**
     * TODO CAPTURETIME验证 为空过滤  10019  验证时间长度为10或14位
     * @param map
     * @param errorMap
     */
    public static void capturetimeValidation( Map<String, String> map,Map<String,Object> errorMap){
        if(map == null){
            return ;
        }
        if(map.containsKey(CAPTURETIME)){
            String capturetime=map.get(CAPTURETIME);
            if(StringUtils.isBlank(capturetime)){
                errorMap.put(CAPTURETIME,"CAPTURETIME不能为空");
                errorMap.put(CAPTURETIME_ERROR,CAPTURETIME_ERRORCODE);
            }else{
                boolean bool = Validation.isCAPTURETIME(capturetime);
                if(!bool){
                    errorMap.put(CAPTURETIME,capturetime);
                    errorMap.put(CAPTURETIME_ERROR,CAPTURETIME_ERRORCODE);
                }
            }
        }
    }

    //TODO EMAIL验证 为空过滤 为错误过滤  10004  通过TABLE取USERNAME验证
    public static void emailValidation( Map<String, String> map,Map<String,Object> errorMap){
        if(map == null){
            return ;
        }
        if(map.get("TABLE").equals(EMAIL)){
            String email=map.get(USERNAME);
            if(StringUtils.isNotBlank(email)){
                boolean bool = Validation.isEmail(email);
                if(!bool){
                    errorMap.put(EMAIL,email);
                    errorMap.put(EMAIL_ERROR,EMAIL_ERRORCODE);
                }
            }else{
                errorMap.put(EMAIL,"EMAIL不能为空");
                errorMap.put(EMAIL_ERROR,EMAIL_ERRORCODE);
            }
        }
    }

    //TODO EMAIL验证 为空过滤 为错误过滤  10004  通过TABLE取USERNAME验证
    public static void timeValidation( Map<String, String> map,Map<String,Object> errorMap){
        if(map == null){
            return ;
        }
        if(map.containsKey(STARTTIME)&&map.containsKey(ENDTIME)){
            String starttime=map.get(STARTTIME);
            String endtime=map.get(ENDTIME);
            if(StringUtils.isBlank(starttime)&&StringUtils.isBlank(endtime)){
                errorMap.put(STARTTIME,"STARTTIME和ENDTIME不能同时为空");
                errorMap.put(STARTTIME_ERROR,STARTTIME_ERRORCODE);
                errorMap.put(ENDTIME,"STARTTIME和ENDTIME不能同时为空");
                errorMap.put(ENDTIME_ERROR,ENDTIME_ERRORCODE);
            }else{
                Boolean bool1 = istime(starttime, STARTTIME, STARTTIME_ERROR, STARTTIME_ERRORCODE, errorMap);
                Boolean bool2 = istime(endtime, ENDTIME, ENDTIME_ERROR, ENDTIME_ERRORCODE, errorMap);

                if(bool1&&bool2&&(starttime.length()!=endtime.length())){
                    errorMap.put(STARTTIME,"STARTTIME和ENDTIME长度不等 STARTTIME:"+starttime + "\t"+"ENDTIME:" + endtime);
                    errorMap.put(STARTTIME_ERROR,STARTTIME_ERRORCODE);
                    errorMap.put(ENDTIME,"STARTTIME和ENDTIME长度不等 STARTTIME:"+starttime + "\t"+"ENDTIME:" + endtime);
                    errorMap.put(ENDTIME_ERROR,ENDTIME_ERRORCODE);
                }
                else if(bool1&&bool2&&(endtime.compareTo(starttime)<0)){
                    errorMap.put(STARTTIME,"ENDTIME必须大于STARTTIME  STARTTIME:"+starttime + "\t"+"ENDTIME:" + endtime);
                    errorMap.put(STARTTIME_ERROR,STARTTIME_ERRORCODE);
                    errorMap.put(ENDTIME,"ENDTIME必须大于STARTTIME  STARTTIME:"+starttime + "\t"+"ENDTIME:" + endtime);
                    errorMap.put(ENDTIME_ERROR,ENDTIME_ERRORCODE);
                }
            }

        }else if(map.containsKey(LOGINTIME)&&map.containsKey(LOGOUTTIME)){

            String logintime=map.get(LOGINTIME);
            String logouttime=map.get(LOGOUTTIME);

            if(StringUtils.isBlank(logintime)&&StringUtils.isBlank(logouttime)){
                errorMap.put(LOGINTIME,"LOGINTIME和LOGOUTTIME不能同时为空");
                errorMap.put(LOGINTIME_ERROR,LOGINTIME_ERRORCODE);
                errorMap.put(LOGOUTTIME,"LOGINTIME和LOGOUTTIME不能同时为空");
                errorMap.put(LOGOUTTIME_ERROR,LOGOUTTIME_ERRORCODE);
            }else{
                Boolean bool1 = istime(logintime, LOGINTIME, LOGINTIME_ERROR, LOGINTIME_ERRORCODE, errorMap);
                Boolean bool2 = istime(logouttime, LOGOUTTIME, LOGOUTTIME_ERROR, LOGOUTTIME_ERRORCODE, errorMap);

                if(bool1&&bool2&&(logintime.length()!=logouttime.length())){
                    errorMap.put(LOGINTIME,"LOGOUTTIME LOGINTIME长度不等 LOGINTIME:"+logintime + "\t"+"LOGOUTTIME:" + logouttime);
                    errorMap.put(LOGINTIME_ERROR,LOGINTIME_ERRORCODE);
                    errorMap.put(LOGOUTTIME,"LOGOUTTIME LOGINTIME长度不等 LOGINTIME:"+logintime + "\t"+"LOGOUTTIME:" + logouttime);
                    errorMap.put(LOGOUTTIME_ERROR,LOGOUTTIME_ERRORCODE);
                }
                else if(bool1&&bool2&&(logouttime.compareTo(logintime)<0)){
                    errorMap.put(LOGINTIME,"LOGOUTTIME必须大于LOGINTIME  LOGINTIME:"+logintime + "\t"+"LOGOUTTIME:" + logouttime);
                    errorMap.put(LOGINTIME_ERROR,LOGINTIME_ERRORCODE);
                    errorMap.put(LOGOUTTIME,"LOGOUTTIME必须大于LOGINTIME  LOGINTIME:"+logintime + "\t"+"LOGOUTTIME:" + logouttime);
                    errorMap.put(LOGOUTTIME_ERROR,LOGOUTTIME_ERRORCODE);
                }
            }
        }
    }

    //TODO AUTH_TYPE验证  为空过滤 为错误过滤  10020
    public static void authtypeValidation( Map<String, String> map,Map<String,Object> errorMap){
        if(map == null){
            return ;
        }
        String fileName=map.get(MapFields.FILENAME);

        if(fileName.split("_").length<=2){
            map = null;
            return;
        }

        if(StringUtils.isNotBlank(fileName)){
            if("bh".equals(fileName.split("_")[2])||"wy".equals(fileName.split("_")[2])||"yc".equals(fileName.split("_")[2])){
                return ;
            }else if(map.containsKey(AUTH_TYPE)){
                String authtype=map.get(AUTH_TYPE);
                if(StringUtils.isNotBlank(authtype)){
                    if(listAuthType.contains(authtype)){
                        if("1020004".equals(authtype)){
                            String sjhm=map.get(MapFields.AUTH_ACCOUNT);

                            boolean ismobile = Validation.isMobile(sjhm);
                            if(!ismobile){
                                errorMap.put(SJHM,sjhm);
                                errorMap.put(SJHM_ERROR,SJHM_ERRORCODE);
                            }
                        }
                        if("1020002".equals(authtype)){

                            String mac=map.get(MapFields.AUTH_ACCOUNT);
                            boolean ismac = Validation.isMac(mac);
                            if(!ismac){
                                errorMap.put(MAC,mac);
                                errorMap.put(MAC_ERROR,MAC_ERRORCODE);
                            }
                        }
                    }else{
                        errorMap.put(AUTH_TYPE,"AUTHTYPE_LIST 影射里没有"+ "\t"+ "["+ authtype+"]");
                        errorMap.put(AUTH_TYPE_ERROR,AUTH_TYPE_ERRORCODE);
                    }
                }else{
                    errorMap.put(AUTH_TYPE,"AUTH_TYPE 不能为空");
                    errorMap.put(AUTH_TYPE_ERROR,AUTH_TYPE_ERRORCODE);
                }
            }
        }
    }

    private static final String LONGITUDE = "longitude";
    private static final String LATITUDE = "latitude";
    private static final String LONGITUDE_ERROR=ErrorMapFields.LONGITUDE_ERROR;
    private static final String LONGITUDE_ERRORCODE=ErrorMapFields.LONGITUDE_ERRORCODE;
    private static final String LATITUDE_ERROR=ErrorMapFields.LATITUDE_ERROR;
    private static final String LATITUDE_ERRORCODE=ErrorMapFields.LATITUDE_ERRORCODE;

    /**
     * 经纬度验证  错误过滤  10012  10013
     * @param map
     * @param errorMap
     */
    public static void longlaitValidation( Map<String, String> map,Map<String,Object> errorMap){
        if(map == null){
            return ;
        }
        if(map.containsKey(LONGITUDE)&&map.containsKey(LATITUDE)){
            String longitude=map.get(LONGITUDE);
            String latitude=map.get(LATITUDE);

            boolean bool1 = Validation.isLONGITUDE(longitude);
            boolean bool2 = Validation.isLATITUDE(latitude);

            if(!bool1){
                errorMap.put(LONGITUDE,longitude);
                errorMap.put(LONGITUDE_ERROR,LONGITUDE_ERRORCODE);
            }
            if(!bool2){
                errorMap.put(LATITUDE,latitude);
                errorMap.put(LATITUDE_ERROR,LATITUDE_ERRORCODE);
            }
        }
    }

    public static Boolean istime(String time,String str1,String str2,String str3,Map<String,Object> errorMap){

        if(StringUtils.isNotBlank(time)){
            boolean bool = Validation.isCAPTURETIME(time);
            if(!bool){
                errorMap.put(str1,time);
                errorMap.put(str2,str3);
                return false;
            }
            return true;
        }
        return false;
    }
}

9、配置CDH上的Agent文件—-跟FolderSource等里面读取配置文件相对应

Flume配置文件

Flume配置:

tier1.sources= source1
tier1.channels=channel1
tier1.sinks=sink1

#定义source1
tier1.sources.source1.type = com.hsiehchou.flume.source.FolderSource

#读取文件之后睡眠时间
tier1.sources.source1.sleeptime=5
tier1.sources.source1.filenum=3000
tier1.sources.source1.dirs =/usr/chl/data/filedir/
tier1.sources.source1.successfile=/usr/chl/data/filedir_successful/
tier1.sources.source1.deserializer.outputCharset=UTF-8
tier1.sources.source1.channels = channel1

# 定义拦截器1
tier1.sources.source1.interceptors=i1
tier1.sources.source1.interceptors.i1.type=com.hsiehchou.flume.interceptor.DataCleanInterceptor$Builder

#定义channel
tier1.channels.channel1.type = memory
tier1.channels.channel1.keep-alive= 300
tier1.channels.channel1.capacity = 1000000
tier1.channels.channel1.transactionCapacity = 5000
tier1.channels.channel1.byteCapacityBufferPercentage = 200
tier1.channels.channel1.byteCapacity = 80000

#定义sink1
tier1.sinks.sink1.type = com.hsiehchou.flume.sink.KafkaSink
tier1.sinks.sink1.kafkatopics = chl_test7
tier1.sinks.sink1.channel = channel1

ftp监控文件

flumesource 不断监控 ftp 文件目录,通过自定义拦截器拦截,然后推送到flumechannel,然后通过flumesink下沉到kafka

10、flume打包到服务器执行

flume插件目录

不能放在默认的/usr/lib/flume-ng/plugins.d下面

mkdir -p /var/lib/flume-ng/plugins.d/chl/lib
mkdir -p /usr/chl/data/filedir/
mkdir -p /usr/chl/data/filedir_successful/

要设置777,flume启动的时候是以flume权限启动的,所以要更改权限
chmod 777 /usr/chl/data/filedir/

kafka-topics –zookeeper hadoop1:2181 –topic chl_test7 –create –replication-factor 1 –partitions 3

kafka-topics –zookeeper hadoop1:2181 –list

kafka-topics –zookeeper hadoop1:2181 –delete –topic chl_test7

kafka-console-consumer –bootstrap-server hadoop1:9092 –topic chl_test7 –from-beginning
消费kafka

六、Kafka开发

xz_bigdata_kafka

1、pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata2</artifactId>
        <groupId>com.hsiehchou</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_kafka</artifactId>

    <name>xz_bigdata_kafka</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <poi.version>3.14</poi.version>
        <kafka.version>0.9.0-kafka-2.0.2</kafka.version>
        <mysql.connector.version>5.1.46</mysql.connector.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_resources</artifactId>
            <version>1.0-SNAPSHOT</version>
            <optional>true</optional>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>org.apache.zookeeper</groupId>
            <artifactId>zookeeper</artifactId>
            <version>${zookeeper.version}-${cdh.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka_2.10</artifactId>
            <version>${kafka.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>${poi.version}</version>
            <optional>true</optional>
        </dependency>

        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>

        <dependency>
            <artifactId>scala-reflect</artifactId>
            <groupId>org.scala-lang</groupId>
            <version>${scala.version}</version>
        </dependency>
    </dependencies>

</project>

2、config/KafkaConfig.java—kafka配置文件 解析器

package com.hsiehchou.kafka.config;

import com.hsiehchou.common.config.ConfigUtil;
import kafka.producer.ProducerConfig;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Properties;

/**
 * kafka配置文件 解析器
 */
public class KafkaConfig {

    private static final Logger LOG = LoggerFactory.getLogger(KafkaConfig.class);

    private static final String DEFUALT_CONFIG_PATH = "kafka/kafka-server-config.properties";

    private volatile static KafkaConfig kafkaConfig = null;

    private ProducerConfig config;
    private Properties properties;

    private KafkaConfig() throws IOException{
        try {
            properties = ConfigUtil.getInstance().getProperties(DEFUALT_CONFIG_PATH);
        } catch (Exception e) {
            IOException ioException = new IOException();
            ioException.addSuppressed(e);
            throw ioException;
        }
        config = new ProducerConfig(properties);
    }

    public static KafkaConfig getInstance(){
        if(kafkaConfig == null){
            synchronized (KafkaConfig.class) {
                if(kafkaConfig == null){
                    try {
                        kafkaConfig = new KafkaConfig();
                    } catch (IOException e) {
                        LOG.error("实例化kafkaConfig失败", e);
                    }
                }
            }
        }
        return kafkaConfig;
    }

    public ProducerConfig getProducerConfig(){
        return config;
    }

    /**
      * 获取当前时间的字符串       格式为:yyyy-MM-dd HH:mm:ss
      * @return String
     */
    public static String nowStr(){
        return new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format( new Date() );
    }
}

3、producer/StringProducer.java—生产者

package com.hsiehchou.kafka.producer;

import com.hsiehchou.common.thread.ThreadPoolManager;
import com.hsiehchou.kafka.config.KafkaConfig;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;

public class StringProducer {
    private static final Logger LOG = LoggerFactory.getLogger(StringProducer.class);

    public static void main(String[] args) {
        StringProducer.producer("chl_test2","{\"rksj\":\"1558177156\",\"latitude\":\"24.000000\",\"imsi\":\"000000000000000\",\"accept_message\":\"\",\"phone_mac\":\"aa-aa-aa-aa-aa-aa\",\"device_mac\":\"bb-bb-bb-bb-bb-bb\",\"message_time\":\"1789098762\",\"filename\":\"wechat_source1_1111119.txt\",\"absolute_filename\":\"/usr/chl/data/filedir_successful/2019-05-18/data/filedir/wechat_source1_1111119.txt\",\"phone\":\"18609765432\",\"device_number\":\"32109231\",\"imei\":\"000000000000000\",\"id\":\"1792d6529e2143fa85717e706403c83c\",\"collect_time\":\"1557305988\",\"send_message\":\"\",\"table\":\"wechat\",\"object_username\":\"judy\",\"longitude\":\"23.000000\",\"username\":\"andiy\"}");
    }

    private static int threadSize = 6;

    /**
     * 生产单条消息 单条推送
     * @param topic
     * @param recourd
     */
    public static void producer(String topic,String recourd){
        Producer<String, String> producer = new Producer<>(KafkaConfig.getInstance().getProducerConfig());
        KeyedMessage<String, String> keyedMessage = new KeyedMessage<>(topic, recourd);
        producer.send(keyedMessage);
        LOG.info("发送数据"+recourd+"到kafka成功");
        producer.close();
    }

    /**
     * 批量推送
     * @param topic
     * @param listRecourd
     */
    public static void producerList(String topic,List<String> listRecourd){
        Producer<String, String> producer = new Producer<>(KafkaConfig.getInstance().getProducerConfig());
        List<KeyedMessage<String, String>> listKeyedMessage= new ArrayList<>();
        listRecourd.forEach(recourd->{
            listKeyedMessage.add(new KeyedMessage<>(topic, recourd));
        });
        producer.send(listKeyedMessage);
        producer.close();
    }

    /**
     * 多线程推送
     * @param topic  kafka  topic
     * @param listMessage 消息
     * @throws Exception
     */
    public void producer(String topic,List<String> listMessage) throws Exception{
        //  int size = listMessage.size();

        List<List<String>> lists = splitList(listMessage, 5);
        int threadNum = lists.size();

        long t1 = System.currentTimeMillis();
        CountDownLatch cdl = new CountDownLatch(threadNum);

        //使用线程池
        ExecutorService executorService = ThreadPoolManager.getInstance().getExecutorService();
        LOG.info("开启 " + threadNum + " 个线程来向  topic " + topic + " 生产数据 . ");

        for (int i = 0; i < threadNum; i++) {
            try {
                executorService.execute(new ProducerTask(topic,lists.get(i)));
            } catch (Exception e) {
                LOG.error("", e);
            }
        }
        cdl.await();
        long t = System.currentTimeMillis() - t1;
        LOG.info(  " 一共耗时  :" + t + "  毫秒 ... " );
        executorService.shutdown();
    }

    /**
     * 拆分消息集合,计算使用多少个线程执行运算
     * @param mtList
     */
    public static List<List<String>> splitList(List<String> mtList, int splitSize){
        if(mtList == null || mtList.size()==0){
            return null;
        }
        int length = mtList.size();

        // 计算可以分成多少组
        int num = ( length + splitSize - 1 )/splitSize ;
        List<List<String>> spiltList = new ArrayList<>(num);

        for (int i = 0; i < num; i++) {
            // 开始位置
            int fromIndex = i * splitSize;
            // 结束位置
            int toIndex = (i+1) * splitSize < length ? ( i+1 ) * splitSize : length ;
            spiltList.add(mtList.subList(fromIndex,toIndex)) ;
        }
        return  spiltList;
    }

    class ProducerTask implements Runnable{
        private String topic;
        private List<String> listRecourd;
        public ProducerTask( String topic, List<String> listRecourd){
            this.topic = topic;
            this.listRecourd = listRecourd;
        }
        public void run() {
            producerList(topic,listRecourd);
        }
    }

   /* public static void producer(String topic,List<KeyedMessage<String,String>> listMessage) throws Exception{
        int size = listMessage.size();
        int threads = ( ( size - 1  ) / threadSize ) + 1;

        long t1 = System.currentTimeMillis();
        CountDownLatch cdl = new CountDownLatch(threads);
        //使用线程池
        ExecutorService executorService = ThreadPoolManager.getInstance().getExecutorService();
        LOG.info("开启 " + threads + " 个线程来向  topic " + topic + " 生产数据 . ");
      *//*  for( int i = 0 ; i < threads ; i++ ){
            executorService.execute( new StringProducer.ChildProducer( start , end ,  topic , id,  cdl ));
        }*//*
        cdl.await();
        long t = System.currentTimeMillis() - t1;
        LOG.info(  " 一共耗时  :" + t + "  毫秒 ... " );
        executorService.shutdown();
    }

    static class ChildProducer implements Runnable{

        public ChildProducer( int start , int end ,  String topic , String id,  CountDownLatch cdl ){


        }

        public void run() {

        }
    }
    */

}

七、Spark—kafka2es开发

Cloudera查找对应的maven依赖地址

https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh5_maven_repo_514x.html#concept_flv_nwn_yk

SparkStreaming+Kafka的两种模式receiver模式和Direct模式

Sparkstreming + kafka receiver模式理解

kafka的receiver模式

receiver模式理解
在SparkStreaming程序运行起来后,Executor中会有receiver tasks接收kafka推送过来的数据。数据会被持久化,默认级别为MEMORY_AND_DISK_SER_2,这个级别也可以修改。receiver task对接收过来的数据进行存储和备份,这个过程会有节点之间的数据传输。备份完成后去zookeeper中更新消费偏移量,然后向Driver中的receiver tracker汇报数据的位置。最后Driver根据数据本地化将task分发到不同节点上执行。

receiver模式中存在的问题
当Driver进程挂掉后,Driver下的Executor都会被杀掉,当更新完zookeeper消费偏移量的时候,Driver如果挂掉了,就会存在找不到数据的问题,相当于丢失数据。

如何解决这个问题?
开启WAL(write ahead log)预写日志机制,在接受过来数据备份到其他节点的时候,同时备份到HDFS上一份(我们需要将接收来的数据的持久化级别降级到MEMORY_AND_DISK),这样就能保证数据的安全性。不过,因为写HDFS比较消耗性能,要在备份完数据之后才能进行更新zookeeper以及汇报位置等,这样会增加job的执行时间,这样对于任务的执行提高了延迟度。

注意
1)开启WAL之后,接受数据级别要降级,有效率问题
2)开启WAL要checkpoint
3)开启WAL(write ahead log),往HDFS中备份一份数据

Sparkstreming + kafka receiver模式理解

kafka的direct模式

  1. 简化数据处理流程
  2. 自己定义offset存储,保证数据0丢失,但是会存在重复消费问题。(解决消费等幂问题)
  3. 不用接收数据,自己去kafka中拉取

1、spark下的pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata2</artifactId>
        <groupId>com.hsiehchou</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_spark</artifactId>

    <name>xz_bigdata_spark</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <spark.version>1.6.0</spark.version>
    </properties>

    <dependencies>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_resources</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_es</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_redis</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_hbase</artifactId>
            <version>1.0-SNAPSHOT</version>
            <exclusions>
                <exclusion>
                    <artifactId>servlet-api</artifactId>
                    <groupId>javax.servlet</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>gson</artifactId>
                    <groupId>com.google.code.gson</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.10</artifactId>
            <version>${spark.version}-${cdh.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.10</artifactId>
            <version>${spark.version}-${cdh.version}</version>
            <exclusions>
                <exclusion>
                    <artifactId>httpcore</artifactId>
                    <groupId>org.apache.httpcomponents</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>httpclient</artifactId>
                    <groupId>org.apache.httpcomponents</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>slf4j-api</artifactId>
                    <groupId>org.slf4j</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>gson</artifactId>
                    <groupId>com.google.code.gson</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.10</artifactId>
            <version>${spark.version}-${cdh.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka_2.10</artifactId>
            <version>${spark.version}-${cdh.version}</version>
        </dependency>

        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch-spark-13_2.10</artifactId>
            <version>6.2.3</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <version>2.15.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

            <plugin><!--打包依赖的jar包-->
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <configuration>
                    <outputDirectory>${project.build.directory}/lib</outputDirectory>
                    <excludeTransitive>false</excludeTransitive> <!-- 表示是否不包含间接依赖的包 -->
                    <stripVersion>false</stripVersion> <!-- 去除版本信息 -->
                </configuration>

                <executions>
                    <execution>
                        <id>copy-dependencies</id>
                        <phase>package</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                        <configuration>
                            <!-- 拷贝项目依赖包到lib/目录下 -->
                            <outputDirectory>${project.build.directory}/jars</outputDirectory>
                            <excludeTransitive>false</excludeTransitive>
                            <stripVersion>false</stripVersion>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

2、spark中的文件结构

spark中的文件结构

让IDEA能新建scala.class

点击”+”号,选择Scala SDK,点击Browse,选择本地下载的scala-sdk-2.10.4

3、xz_bigdata_spark/spark/common

SparkContextFactory.scala

package com.hsiehchou.spark.common

import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{Accumulator, SparkContext}

object SparkContextFactory {

  def newSparkBatchContext(appName:String = "sparkBatch") : SparkContext = {
    val sparkConf = SparkConfFactory.newSparkBatchConf(appName)
    new SparkContext(sparkConf)
  }

  def newSparkLocalBatchContext(appName:String = "sparkLocalBatch" , threads : Int = 2) : SparkContext = {
    val sparkConf = SparkConfFactory.newSparkLoalConf(appName, threads)
    sparkConf.set("","")
    new SparkContext(sparkConf)
  }

  def getAccumulator(appName:String = "sparkBatch") : Accumulator[Int] = {
    val sparkConf = SparkConfFactory.newSparkBatchConf(appName)
    val accumulator: Accumulator[Int] = new SparkContext(sparkConf).accumulator(0,"")
    accumulator
  }

  /**
    * 创建本地流streamingContext
    * @param appName             appName
    * @param batchInterval      多少秒读取一次
    * @param threads            开启多少个线程
    * @return
    */
  def newSparkLocalStreamingContext(appName:String = "sparkStreaming" ,
                                    batchInterval:Long = 30L ,
                                    threads : Int = 4) : StreamingContext = {
    val sparkConf =  SparkConfFactory.newSparkLocalConf(appName, threads)
    // sparkConf.set("spark.streaming.receiver.maxRate","10000")
    sparkConf.set("spark.streaming.kafka.maxRatePerPartition","1")
    new StreamingContext(sparkConf, Seconds(batchInterval))
  }

  /**
    * 创建集群模式streamingContext
    * 这里不设置线程数,在submit中指定
    * @param appName
    * @param batchInterval
    * @return
    */
  def newSparkStreamingContext(appName:String = "sparkStreaming" , batchInterval:Long = 30L) : StreamingContext = {
    val sparkConf = SparkConfFactory.newSparkStreamingConf(appName)
    new StreamingContext(sparkConf, Seconds(batchInterval))
  }

  def startSparkStreaming(ssc:StreamingContext){
    ssc.start()
      ssc.awaitTermination()
      ssc.stop()
  }
}

convert/DataConvert.scala

package com.hsiehchou.spark.common.convert

import java.util

import com.hsiehchou.common.config.ConfigUtil
import org.apache.spark.Logging

import scala.collection.JavaConversions._

/**
  * 数据类型转换
  */
object DataConvert extends Serializable with Logging {

  val fieldMappingPath = "es/mapping/fieldmapping.properties"

  private val typeFieldMap: util.HashMap[String, util.HashMap[String, String]] = getEsFieldtypeMap()

  /**
    * 将Map<String,String>转化为Map<String,Object>
    */
  def strMap2esObjectMap(map:util.Map[String,String]):util.Map[String,Object] ={

    //获取配置文件中的数据类型
    val dataType = map.get("table")

    //获取配置文件中的数据类型的 字段类型
    val fieldMap = typeFieldMap.get(dataType)

    //获取数据类型的所有字段,配置文件里的字段
    val keySet = fieldMap.keySet()

    //var objectMap:util.HashMap[String,Object] = new util.HashMap[String,Object]()
    var objectMap = new java.util.HashMap[String, Object]()

    //数据里的字段
    val set = map.keySet().iterator()

    try {
      //遍历真实数据的所有字段
      while (set.hasNext()) {
        val key = set.next()
        var dataType:String = "string"

        //如果在配置文件中的key包含真实数据的key
        if (keySet.contains(key)) {
          //则获取真实数据字段的数据类型
          dataType = fieldMap.get(key)
        }
        dataType match {
          case "long" => objectMap = BaseDataConvert.mapString2Long(map, key, objectMap)
          case "string" => objectMap = BaseDataConvert.mapString2String(map, key, objectMap)
          case "double" => objectMap = BaseDataConvert.mapString2Double(map, key, objectMap)
          case _ => objectMap = BaseDataConvert.mapString2String(map, key, objectMap)
        }
      }
    }catch {
      case e: Exception => logInfo("转换异常", e)
    }
    println("转换后" + objectMap)
    objectMap
  }

  /**
    * 读取 "es/mapping/fieldmapping.properties 配置文件
    * 主要作用是将 真实数据 根据配置来作数据类型转换 转换为和ES mapping结构保持一致
    * @return
    */
  def getEsFieldtypeMap(): util.HashMap[String, util.HashMap[String, String]] = {

    // ["wechat":["phone_mac":"string","latitude":"long"]]

    //定义返回Map
    val mapMap = new util.HashMap[String, util.HashMap[String, String]]
    val properties = ConfigUtil.getInstance().getProperties(fieldMappingPath)
    val tables = properties.get("tables").toString.split(",")
    val tableFields = properties.keySet()

    tables.foreach(table => {
      val map = new util.HashMap[String, String]()
      tableFields.foreach(tableField => {
        if (tableField.toString.startsWith(table)) {
          val key = tableField.toString.split("\\.")(1)
          val value = properties.get(tableField).toString
          map.put(key, value)
        }
      })
      mapMap.put(table, map)
    })
    mapMap
  }
}

scala中的scala文件结构

4、org/apache/spark/streaming/kafka/KafkaManager.scala

构建Kafka时用到,KafkaCluster在org.apache.spark.streaming.kafka下面,而且只能在spark里面使用,这时候我们就可以新建相同的目录结构,就可以引用了,如下图所示:

为什么要新建org.apache.spark.streaming.kafka

package org.apache.spark.streaming.kafka

import com.alibaba.fastjson.TypeReference
import kafka.common.TopicAndPartition
import kafka.message.MessageAndMetadata
import kafka.serializer.{Decoder, StringDecoder}
import org.apache.spark.Logging
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.dstream.{DStream, InputDStream}

import scala.reflect.ClassTag

/**
  * 包名说明 :KafkaCluster是私有类,只能在spark包中使用,
  *           所以包名保持和 KafkaCluster 一致才能调用
  * @param kafkaParams
  * @param autoUpdateoffset
  */
class KafkaManager(val kafkaParams:Map[String, String],
                   val autoUpdateoffset:Boolean =true) extends Serializable with Logging {

  //构造一个KafkaCluster
  @transient
  private var cluster = new KafkaCluster(kafkaParams)

  //定义一个单例
  def kc(): KafkaCluster = {
    if (cluster == null) {
      cluster = new KafkaCluster(kafkaParams)
    }
    cluster
  }

  /**
    * 泛型流读取器
    * @param ssc
    * @param topics kafka topics,多个topic按","分割
    * @tparam K  泛型 K
    * @tparam V  泛型 V
    * @tparam KD scala泛型 KD <: Decoder[K] 说明KD 的类型必须是Decoder[K]的子类型  上下界
    * @tparam VD scala泛型 VD <: Decoder[V] 说明VD 的类型必须是Decoder[V]的子类型  上下界
    * @return
    */
  def createDirectStream[K: ClassTag, V: ClassTag,
  KD <: Decoder[K] : ClassTag,
  VD <: Decoder[V] : ClassTag](ssc: StreamingContext, topics: Set[String]): InputDStream[(K, V)] = {

    //获取消费者组ID
    //val groupId = "test"
    val groupId = kafkaParams.get("group.id").getOrElse("default")

    // 在zookeeper上读取offsets前先根据实际情况更新offsets
    setOrUpdateOffsets(topics, groupId)

    //把所有的offsets处理完成,就可以从zookeeper上读取offset开始消费message
    val messages = {
      //获取kafka分区信息  为了打印信息
      val partitionsE = kc.getPartitions(topics)
      require(partitionsE.isRight, s"获取 kafka topic ${topics}`s partition 失败。")
      val partitions = partitionsE.right.get
      println("打印分区信息")
      partitions.foreach(println(_))

      //获取分区的offset
      val consumerOffsetsE = kc.getConsumerOffsets(groupId, partitions)
      require(consumerOffsetsE.isRight, s"获取 kafka topic ${topics}`s consumer offsets 失败。")
      val consumerOffsets = consumerOffsetsE.right.get
      println("打印消费者分区偏移信息")
      consumerOffsets.foreach(println(_))
      //读取数据
      KafkaUtils.createDirectStream[K, V, KD, VD, (K, V)](
        ssc, kafkaParams, consumerOffsets, (mmd: MessageAndMetadata[K, V]) => (mmd.key, mmd.message))
    }

    if (autoUpdateoffset) {
      //更新offset
      messages.foreachRDD(rdd => {
        logInfo("RDD 消费成功,开始更新zookeeper上的偏移")
        updateZKOffsets(rdd)
      })
    }
    messages
  }


  /**
    * 创建数据流前,根据实际消费情况更新消费offsets
    * @param topics
    * @param groupId
    */
  private def setOrUpdateOffsets(topics: Set[String], groupId: String): Unit = {
    topics.foreach(topic => {
      //先获取Kafka offset信息  Kafka partions的节点信息
      //获取kafka本身的偏移量, Either类型可以认为就是封装了2种信息
      val partitionsE = kc.getPartitions(Set(topic))
      logInfo(partitionsE + "")
      //require(partitionsE.isRight, "获取partition失败")
      require(partitionsE.isRight, s"获取 kafka topic ${topic}`s partition 失败。")
      println("partitionsE=" + partitionsE)
      val partitions = partitionsE.right.get
      println("打印分区信息")
      partitions.foreach(println(_))

      //获取kafka partions最早的offsets
      val earliestLeader = kc.getEarliestLeaderOffsets(partitions)
      require(earliestLeader.isRight, "获取earliestLeader失败")
      val earliestLeaderOffsets = earliestLeader.right.get
      println("kafka最早的消息偏移量")
      earliestLeaderOffsets.foreach(println(_))

      //获取kafka最末尾的offsets
      val latestLeader = kc.getLatestLeaderOffsets(partitions)
      //require(latestLeader.isRight, "获取latestLeader失败")
      val latestLeaderOffsets = latestLeader.right.get
      println("kafka最末尾的消息偏移量")
      latestLeaderOffsets.foreach(println(_))

      //获取消费者的offsets
      val consumerOffsetsE = kc.getConsumerOffsets(groupId, partitions)

      //判断消费者是否消费过,消费者offset存在
      if (consumerOffsetsE.isRight) {
        /**
          * 如果zk上保存的offsets已经过时了,即kafka的定时清理策略已经将包含该offsets的文件删除。
          * 针对这种情况,只要判断一下zk上的consumerOffsets和earliestLeaderOffsets的大小,
          * 如果consumerOffsets比earliestLeaderOffsets还小的话,说明consumerOffsets已过时,
          * 这时把consumerOffsets更新为earliestLeaderOffsets
          */
        //如果消费过,直接取过来的kafka消费,,earliestLeader 存在
        if (earliestLeader.isRight) {
          //获取到最早的offset  也就是最小的offset
          require(earliestLeader.isRight, "获取earliestLeader失败")
          val earliestLeaderOffsets = earliestLeader.right.get
          //获取消费者组的offset
          val consumerOffsets = consumerOffsetsE.right.get
          // 将 consumerOffsets 和 earliestLeaderOffsets 的offsets 做比较
          // 可能只是存在部分分区consumerOffsets过时,所以只更新过时分区的consumerOffsets为earliestLeaderOffsets
          var offsets: Map[TopicAndPartition, Long] = Map()

          consumerOffsets.foreach({ case (tp, n) =>
            val earliestLeaderOffset = earliestLeaderOffsets(tp).offset
            //如果消費者的偏移小于 kafka中最早的offset,那么,將最早的offset更新到zk
            if (n < earliestLeaderOffset) {
              logWarning("consumer group:" + groupId + ",topic:" + tp.topic + ",partition:" + tp.partition +
                " offsets已经过时,更新为" + earliestLeaderOffset)
              offsets += (tp -> earliestLeaderOffset)
            }
          })
          //设置offsets
          setOffsets(groupId, offsets)
        }
      } else {
        //如果没有消费过,那么就去取kafka获取earliestLeader写到zk中
        // 消费者还没有消费过  也就是zookeeper中还没有消费者的信息
        if (earliestLeader.isLeft)
          logError(s"${topic} hasConsumed but earliestLeaderOffsets is null。")

        //看是从头消费还是从末开始消费  smallest表示从头开始消费
        val reset = kafkaParams.get("auto.offset.reset").map(_.toLowerCase).getOrElse("smallest")

        //往zk中去写,构建消费者 偏移
        var leaderOffsets: Map[TopicAndPartition, Long] = Map.empty

        //从头消费
        if (reset.equals("smallest")) {
          //分为 存在 和 不存在 最早的消费记录 两种情况
          //如果kafka 最小偏移存在,则将消费者偏移设置为和kafka偏移一样
          if (earliestLeader.isRight) {
            leaderOffsets = earliestLeader.right.get.map {
              case (tp, offset) => (tp, offset.offset)
            }
          } else {
            //如果不存在,则从新构建偏移全部为0 offsets
            leaderOffsets = partitions.map(tp => (tp, 0L)).toMap
          }
        } else {
          //直接获取最新的offset
          leaderOffsets = kc.getLatestLeaderOffsets(partitions).right.get.map {
            case (tp, offset) => (tp, offset.offset)
          }
        }
        //设置offsets 写到zk中
        setOffsets(groupId, leaderOffsets)
      }
    })
  }

  /**
    * 设置消费者组的offsets
    * @param groupId
    * @param offsets
    */
  private def setOffsets(groupId: String, offsets: Map[TopicAndPartition, Long]): Unit = {
    if (offsets.nonEmpty) {
      //更新offset
      val o = kc.setConsumerOffsets(groupId, offsets)
      logInfo(s"更新zookeeper中消费组为:${groupId} 的 topic offset信息为: ${offsets}")
      if (o.isLeft) {
        logError(s"Error updating the offset to Kafka cluster: ${o.left.get}")
      }
    }
  }

  /**
    * 通过spark的RDD 更新zookeeper上的消费offsets
    * @param rdd
    */
  def updateZKOffsets[K: ClassTag, V: ClassTag](rdd: RDD[(K, V)]) : Unit = {
    //获取消费者组
    val groupId = kafkaParams.get("group.id").getOrElse("default")
    //spark使用kafka低阶API进行消费的时候,每个partion的offset是保存在 spark的RDD中,所以这里可以直接在
    //RDD的 HasOffsetRanges 中获取倒offsets信息。因为这个信息spark不会把则个信息存储到zookeeper中,所以
    //我们需要自己实现将这部分offsets信息存储到zookeeper中
    val offsetsList = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
    //打印出spark中保存的offsets信息
    offsetsList.foreach(x=>{
      println("获取spark 中的偏移信息"+x)
    })

    for (offsets <- offsetsList) {
      //根据topic和partition 构建topicAndPartition
      val topicAndPartition = TopicAndPartition(offsets.topic, offsets.partition)
      logInfo("将SPARK中的 偏移信息 存到zookeeper中")
      //将消费者组的offsets更新到zookeeper中
      setOffsets(groupId, Map((topicAndPartition, offsets.untilOffset)))
    }
  }

  //(null,{"rksj":"1558178497","latitude":"24.000000","imsi":"000000000000000"})
  //读取kafka流,并将json数据转为map
  def createJsonToJMapObjectDirectStreamWithOffset(ssc:StreamingContext, topicsSet:Set[String]): DStream[java.util.Map[String,Object]] = {
    //一个转换器
    val converter = {json:String =>
      println(json)
      var res : java.util.Map[String,Object] = null
      try {
        //JSON转map的操作
        res = com.alibaba.fastjson.JSON.parseObject(json,
          new TypeReference[java.util.Map[String, Object]]() {})
      } catch {
        case e: Exception => logError(s"解析topic ${topicsSet}, 的记录 ${json} 失败。", e)
      }
      res
    }
    createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
  }

  /**
    * 根据converter创建流数据
    * @param ssc
    * @param topicsSet
    * @param converter
    * @tparam T
    * @return
    */
  def createDirectStreamWithOffset[T:ClassTag](ssc:StreamingContext,
                                               topicsSet:Set[String], converter:String => T): DStream[T] = {
    createDirectStream[String, String, StringDecoder, StringDecoder](ssc, topicsSet)
      .map(pair =>converter(pair._2))
  }

  def createJsonToJMapDirectStreamWithOffset(ssc:StreamingContext,
                                             topicsSet:Set[String]): DStream[java.util.Map[String,String]] = {
    val converter = {json:String =>
      var res : java.util.Map[String,String] = null
      try {
        res = com.alibaba.fastjson.JSON.parseObject(json, new TypeReference[java.util.Map[String, String]]() {})
      } catch {
        case e: Exception => logError(s"解析topic ${topicsSet}, 的记录 ${json} 失败。", e)
      }
      res
    }
    createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
  }

  /*
    /**
      * @param ssc
      * @param topicsSet
      * @return
      */
    def createJsonToJavaBeanDirectStreamWithOffset(ssc:StreamingContext ,
                                                   topicsSet:Set[String]): DStream[Object] = {
      val converter = {json:String =>
        var res : Object = null
        try {
          res = com.alibaba.fastjson.JSON.parseObject(json, new TypeReference[Object]() {})
        } catch {
          case e: Exception => logError(s"解析topic ${topicsSet}, 的记录 ${json} 失败。", e)
        }
        res
      }
      createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
    }
  */

  /*
    def createStringDirectStreamWithOffset(ssc:StreamingContext ,
                                           topicsSet:Set[String]): DStream[String] = {
      val converter = {json:String =>
        json
      }
      createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
    }
  */

  /**
    * 读取JSON的流 并将JSON流 转为MAP流  并且这个流支持RDD向zookeeper中记录消费信息
    * @param ssc   spark ssc
    * @param topicsSet topic 集合 支持从多个kafka topic同时读取数据
    * @return  DStream[java.util.Map[String,String
    */
  def createJsonToJMapStringDirectStreamWithOffset(ssc:StreamingContext , topicsSet:Set[String]): DStream[java.util.Map[String,String]] = {
    val converter = {json:String =>
      var res : java.util.Map[String,String] = null
      try {
        res = com.alibaba.fastjson.JSON.parseObject(json, new TypeReference[java.util.Map[String, String]]() {})
      } catch {
        case e: Exception => logError(s"解析topic ${topicsSet}, 的记录 ${json} 失败。", e)
      }
      res
    }
    createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
  }

  /**
    * 读取JSON的流 并将JSON流 转为MAP流  并且这个流支持RDD向zookeeper中记录消费信息
    * @param ssc   spark ssc
    * @param topicsSet topic 集合 支持从多个kafka topic同时读取数据
    * @return  DStream[java.util.Map[String,String
    */
  def createJsonToJMapStringDirectStreamWithoutOffset(ssc:StreamingContext , topicsSet:Set[String]): DStream[java.util.Map[String,String]] = {
    val converter = {json:String =>
      var res : java.util.Map[String,String] = null
      try {
        res = com.alibaba.fastjson.JSON.parseObject(json, new TypeReference[java.util.Map[String, String]]() {})
      } catch {
        case e: Exception => logError(s"解析topic ${topicsSet}, 的记录 ${json} 失败。", e)
      }
      res
    }
    createDirectStreamWithOffset(ssc, topicsSet, converter).filter(_ != null)
  }

}

object KafkaManager extends Logging{

  def apply(broker:String, groupId:String = "default",
            numFetcher:Int = 1, offset:String = "smallest",
            autoUpdateoffset:Boolean = true): KafkaManager ={
    new KafkaManager(
      createKafkaParam(broker, groupId, numFetcher, offset),
      autoUpdateoffset)
  }

  def createKafkaParam(broker:String, groupId:String = "default",
                       numFetcher:Int = 1, offset:String = "smallest"): Map[String, String] ={
    //创建 stream 时使用的 topic 名字集合
    Map[String, String](
      "metadata.broker.list" -> broker,
      "auto.offset.reset" -> offset,
      "group.id" -> groupId,
      "num.consumer.fetchers" -> numFetcher.toString)
  }
}

5、resources/log4j.properties

### 设置###
log4j.rootLogger = error,stdout,D,E

### 输出信息到控制抬 ###
log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target = System.out
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern = [%-5p] %d{yyyy-MM-dd HH:mm:ss,SSS} method:%l%n%m%n

### 输出DEBUG 级别以上的日志到=E://logs/error.log ###
log4j.appender.D = org.apache.log4j.DailyRollingFileAppender
log4j.appender.D.File = E://logs/log.log
log4j.appender.D.Append = true
log4j.appender.D.Threshold = stdout 
log4j.appender.D.layout = org.apache.log4j.PatternLayout
log4j.appender.D.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss}  [ %t:%r ] - [ %p ]  %m%n

###输出ERROR 级别以上的日志到=E://logs/error.log ###
log4j.appender.E = org.apache.log4j.DailyRollingFileAppender
log4j.appender.E.File =E://logs/error.log 
log4j.appender.E.Append = true
log4j.appender.E.Threshold = ERROR 
log4j.appender.E.layout = org.apache.log4j.PatternLayout
log4j.appender.E.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss}  [ %t:%r ] - [ %p ]  %m%n

6、xz_bigdata_spark/spark/streaming/kafka

Spark_Es_ConfigUtil.scala

package com.hsiehchou.spark.streaming.kafka

import org.apache.spark.Logging

object Spark_Es_ConfigUtil extends Serializable with Logging{

 // val ES_NODES = "es.cluster.nodes"
 // val ES_PORT = "es.cluster.http.port"
 // val ES_CLUSTERNAME = "es.cluster.name"

  val ES_NODES = "es.nodes"
  val ES_PORT = "es.port"
  val ES_CLUSTERNAME = "es.clustername"

  def getEsParam(id_field : String): Map[String,String] ={
    Map[String ,String]("es.mapping.id" -> id_field,
      ES_NODES -> "hadoop1,hadoop2,hadoop3",
      //ES_NODES -> "hadoop1",
      ES_PORT -> "9200",
      ES_CLUSTERNAME -> "xz_es",
      "es.batch.size.entries"->"6000",
      /*   "es.nodes.wan.only"->"true",*/
      "es.nodes.discovery"->"true",
      "es.batch.size.bytes"->"300000000",
      "es.batch.write.refresh"->"false"
    )
  }
}

Spark_Kafka_ConfigUtil.scala

package com.hsiehchou.spark.streaming.kafka

import org.apache.spark.Logging

object Spark_Kafka_ConfigUtil extends Serializable with Logging{

  def getKafkaParam(brokerList:String,groupId : String): Map[String,String]={
    val kafkaParam=Map[String,String](
      "metadata.broker.list" -> brokerList,
      "auto.offset.reset" -> "smallest",
      "group.id" -> groupId,
      "refresh.leader.backoff.ms" -> "1000",
      "num.consumer.fetchers" -> "8")
    kafkaParam
  }
}

7、kafka2es

Kafka2esJob.scala

package com.hsiehchou.spark.streaming.kafka.kafka2es

import com.hsiehchou.es.admin.AdminUtil
import com.hsiehchou.es.client.ESClientUtils
import com.hsiehchou.spark.common.convert.DataConvert
import com.hsiehchou.spark.streaming.kafka.Spark_Es_ConfigUtil
import org.apache.spark.Logging
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming.dstream.DStream
import org.elasticsearch.client.transport.TransportClient
import org.elasticsearch.spark.rdd.EsSpark

object Kafka2esJob extends Serializable with Logging {

  /**
    * 按日期分组写入ES
    * @param dataType
    * @param typeDS
    */
  def insertData2EsBydate(dataType:String,typeDS:DStream[java.util.Map[String,String]]): Unit ={

    //通过 dataType + 日期来动态创建 分索引。 日期格式为 yyyyMMdd
    //主要就是时间混杂  通过时间分组就行了 groupby       filter
    //index前缀  通过对日期进行过滤 避免shuffle操作
    val index_prefix = dataType
    val client: TransportClient = ESClientUtils.getClient
    typeDS.foreachRDD(rdd=>{

      //如果时少量数据可以这样处理
      //rdd.groupBy()
      //吧所有的日期拿到
      val days = getDays(dataType,rdd)

      //我们使用日期对数据进行过滤  par时scala并发集合
      days.par.foreach(day=>{

        //通过前缀+日期组成一个动态的索引   比例  qq + "_" + "20190508"
        val index = index_prefix + "_" + day

        //判断索引是否存在
        val bool = AdminUtil.indexExists(client,index)
        if(!bool){
          //如果不存在,创建
          val mappingPath = s"es/mapping/${index_prefix}.json"
          AdminUtil.buildIndexAndTypes(index, index, mappingPath, 5, 1)
        }
        //构建RDD,数据类型 某一天的数据RDD
        //返回一个map[String,obJECT] 的RDD   //就是一个单一类型  单一天数的RDD
        val tableRDD = rdd.filter(map=>{
          day.equals(map.get("index_date"))
        }).map(x=>{
          //将map[String,String] 转为map[String,obJECT]
          DataConvert.strMap2esObjectMap(x)
        })
        EsSpark.saveToEs(tableRDD,index+ "/"+index,Spark_Es_ConfigUtil.getEsParam("id"))
      })
    })
    //日期为后
  }

  /**
    * 获取日期的集合
    * @param dataType
    * @param rdd
    * @return
    */
  def getDays(dataType:String,rdd:RDD[java.util.Map[String,String]]): Array[String] ={
    //对日期去重,然后集中到driver
    return  rdd.map(x=>{x.get("index_date")}).distinct().collect()
  }

  /**
    * 将RDD转换之后写入ES
    * @param dataType
    * @param typeRDD
    */
  def insertData2Es(dataType:String,typeRDD:RDD[java.util.Map[String,String]]): Unit = {
    val index = dataType
    val esRDD =  typeRDD.map(x=>{
      DataConvert.strMap2esObjectMap(x)
    })
    EsSpark.saveToEs(esRDD,index+ "/"+index,Spark_Es_ConfigUtil.getEsParam("id"))
    println("写入ES" + esRDD.count() + "条数据成功")
  }

  /**
    * 将RDD转换后写入ES
    * @param dataType
    * @param typeDS
    */
  def insertData2Es(dataType:String, typeDS:DStream[java.util.Map[String, String]]): Unit = {
    val index = dataType
    typeDS.foreachRDD(rdd=>{
      val esRDD = rdd.map(x=>{
        DataConvert.strMap2esObjectMap(x)
      })
      EsSpark.saveToEs(rdd, dataType+"/"+dataType, Spark_Es_ConfigUtil.getEsParam("id"))
      println("写入ES" + esRDD.count() + "条数据成功")
    })
  }
}

Kafka2esStreaming.scala

package com.hsiehchou.spark.streaming.kafka.kafka2es

import java.util
import java.util.Properties

import com.hsiehchou.common.config.ConfigUtil
import com.hsiehchou.common.project.datatype.DataTypeProperties
import com.hsiehchou.common.time.TimeTranstationUtils
import com.hsiehchou.spark.common.SparkContextFactory
import com.hsiehchou.spark.streaming.kafka.Spark_Kafka_ConfigUtil
import org.apache.commons.lang3.StringUtils
import org.apache.spark.Logging
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.streaming.kafka.KafkaManager

import scala.collection.JavaConversions._

object Kafka2esStreaming extends Serializable with Logging {
  //获取数据类型
  private val dataTypes: util.Set[String] = DataTypeProperties.dataTypeMap.keySet()

  val kafkaConfig: Properties = ConfigUtil.getInstance().getProperties("kafka/kafka-server-config.properties")

  def main(args: Array[String]): Unit = {

    //val topics = "chl_test7".split(",")
    val topics = args(1).split(",")

    //   val ssc = SparkConfFactory.newSparkLocalStreamingContext("XZ_kafka2es", java.lang.Long.valueOf(10),1)
    val ssc = SparkContextFactory.newSparkStreamingContext("Kafka2esStreaming", java.lang.Long.valueOf(10))

    //构建kafkaManager
    val kafkaManager = new KafkaManager(
      Spark_Kafka_ConfigUtil.getKafkaParam(kafkaConfig.getProperty("metadata.broker.list"), "XZ3")
    )

    //使用kafkaManager创建DStreaming流
    val kafkaDS = kafkaManager.createJsonToJMapStringDirectStreamWithOffset(ssc, topics.toSet)
      //添加一个日期分组字段
      //如果数据其他的转换,可以先在这里进行统一转换
      .map(map=>{
      map.put("index_date",TimeTranstationUtils.Date2yyyyMMddHHmmss(java.lang.Long.valueOf(map.get("collect_time")+"000")))
      map
    }).persist(StorageLevel.MEMORY_AND_DISK)

    //使用par并发集合可以是任务并发执行。在资源充足的情况下
    dataTypes.foreach(datatype=>{
      //过滤出单个类别的数据种类
      val tableDS = kafkaDS.filter(x=>{datatype.equals(x.get("table"))})
      Kafka2esJob.insertData2Es(datatype,tableDS)
    })

    ssc.start()
    ssc.awaitTermination()
  }

  /**
    * 启动参数检查
    * @param args
    */
  def sparkParamCheck(args: Array[String]): Unit ={
    if (args.length == 4) {
      if (StringUtils.isBlank(args(1))) {
        logInfo("kafka集群地址不能为空")
        logInfo("kafka集群地址格式为     主机1名:9092,主机2名:9092,主机3名:9092...")
        logInfo("格式为     主机1名:9092,主机2名:9092,主机3名:9092...")
        System.exit(-1)
      }
      if (StringUtils.isBlank(args(2))) {
        logInfo("kafka topic1不能为空")
        System.exit(-1)
      }
      if (StringUtils.isBlank(args(3))) {
        logInfo("kafka topic2不能为空")
        System.exit(-1)
      }
    }else{
      logError("启动参数个数错误")
    }
  }

  def startJob(ds:DStream[String]): Unit ={
  }
}

java/com/hsiehchou/spark/common/convert/BaseDataConvert.java

package com.hsiehchou.spark.common.convert;

import org.apache.commons.lang.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.HashMap;
import java.util.Map;

public class BaseDataConvert {

    private static final Logger LOG = LoggerFactory.getLogger(BaseDataConvert.class);

    public static HashMap<String,Object> mapString2Long(Map<String,String> map, String key, HashMap<String,Object> objectMap) {
        String logouttime = map.get(key);
        if (StringUtils.isNotBlank(logouttime)) {
            objectMap.put(key, Long.valueOf(logouttime));
        } else {
            objectMap.put(key, 0L);
        }
        return objectMap;
    }

    public static HashMap<String,Object> mapString2Double(Map<String,String> map, String key, HashMap<String,Object> objectMap) {
        String logouttime = map.get(key);
        if (StringUtils.isNotBlank(logouttime)) {
            objectMap.put(key, Double.valueOf(logouttime));
        } else {
            objectMap.put(key, 0.000000);
        }
        return objectMap;
    }

    public static HashMap<String,Object> mapString2String(Map<String,String> map, String key, HashMap<String,Object> objectMap) {
        String logouttime = map.get(key);
        if (StringUtils.isNotBlank(logouttime)) {
            objectMap.put(key, logouttime);
        } else {
            objectMap.put(key, "");
        }
        return objectMap;
    }
}

8、ES动态索引创建

/**
    * 按日期分组写入ES
    * @param dataType
    * @param typeDS
    */
  def insertData2EsBydate(dataType:String,typeDS:DStream[java.util.Map[String,String]]): Unit ={

    //通过 dataType + 日期来动态创建 分索引。 日期格式为 yyyyMMdd
    //主要就是时间混杂  通过时间分组就行了 groupby       filter
    //index前缀  通过对日期进行过滤 避免shuffle操作
    val index_prefix = dataType
    val client: TransportClient = ESClientUtils.getClient
    typeDS.foreachRDD(rdd=>{

      //如果时少量数据可以这样处理
      //rdd.groupBy()
      //吧所有的日期拿到
      val days = getDays(dataType,rdd)

      //我们使用日期对数据进行过滤  par时scala并发集合
      days.par.foreach(day=>{

        //通过前缀+日期组成一个动态的索引   比例  qq + "_" + "20190508"
        val index = index_prefix + "_" + day

        //判断索引是否存在
        val bool = AdminUtil.indexExists(client,index)
        if(!bool){
          //如果不存在,创建
          val mappingPath = s"es/mapping/${index_prefix}.json"
          AdminUtil.buildIndexAndTypes(index, index, mappingPath, 5, 1)
        }
        //构建RDD,数据类型 某一天的数据RDD
        //返回一个map[String,obJECT] 的RDD   //就是一个单一类型  单一天数的RDD
        val tableRDD = rdd.filter(map=>{
          day.equals(map.get("index_date"))
        }).map(x=>{
          //将map[String,String] 转为map[String,obJECT]
          DataConvert.strMap2esObjectMap(x)
        })
        EsSpark.saveToEs(tableRDD,index+ "/"+index,Spark_Es_ConfigUtil.getEsParam("id"))
      })
    })
    //日期为后
  }

xz_bigdata_es下一节展示代码
入ES使用动态索引

9、CDH的java配置和Elasticsearch的配置

cdh的jdk设置
/usr/local/jdk1.8

kafka配置

Default Number of Partitions:num.partitions 8

Offset Commit Topic Number of Partitions:180天

Log Compaction Delete Record Retention Time:log.cleaner.delete.retention.ms 30天

Data Log Roll Hours:log.retention.hours 30天 log.roll.hours 30天

Java Heap Size of Broker:broker_max_heap_size 1吉字节

YARN
容器内存 5g 5g 1g 10g

这里的CDH安装另一篇文章介绍

前提安装好elasticsearch

mkdir /opt/software/elasticsearch/data/

mkdir /opt/software/elasticsearch/logs/

chmod 777 /opt/software/elasticsearch/data/

useradd elasticsearch
passwd elasticsearch

chown -R elasticsearch elasticsearch/

vim /etc/security/limits.conf
添加如下内容:
* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 4096

进入limits.d目录下修改配置文件
vim /etc/security/limits.d/90-nproc.conf

修改如下内容:
soft nproc 4096(修改为此参数,6版本的默认就是4096)

修改配置sysctl.conf
vim /etc/sysctl.conf

添加下面配置:
vm.max_map_count=655360

并执行命令:
sysctl -p

hadoop1的conf配置
elasticsearch.yml

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application

cluster.name: xz_es

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#

node.name: node-1

node.master: true

node.data: true

# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data

path.data: /opt/software/elasticsearch/data

#
# Path to log files:
#
#path.logs: /path/to/logs

path.logs: /opt/software/elasticsearch/logs

#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true

bootstrap.memory_lock: false

bootstrap.system_call_filter: false

#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1

network.host: 192.168.116.201

#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]

discovery.zen.ping.unicast.hosts: ["hadoop1", "hadoop2", "hadoop3"]

#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

jvm.options
修改下
-Xms64m
-Xmx64m

hadoop2的conf配置
elasticsearch.yml

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application

cluster.name: xz_es

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1

node.name: node-2

node.master: false

node.data: true

#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data

path.data: /opt/software/elasticsearch/data

#
# Path to log files:
#
#path.logs: /path/to/logs

path.logs: /opt/software/elasticsearch/logs

#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true

bootstrap.memory_lock: false

bootstrap.system_call_filter: false

#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1

network.host: 192.168.116.202

#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]

discovery.zen.ping.unicast.hosts: ["hadoop1", "hadoop2", "hadoop3"]

#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

jvm.options
修改下
-Xms64m
-Xmx64m

hadoop3的conf配置
elasticsearch.yml

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application

cluster.name: xz_es

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1

node.name: node-3

node.master: false

node.data: true

#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data

path.data: /opt/software/elasticsearch/data

#
# Path to log files:
#
#path.logs: /path/to/logs

path.logs: /opt/software/elasticsearch/logs

#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true

bootstrap.memory_lock: false

bootstrap.system_call_filter: false

#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1

network.host: 192.168.116.203

#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]

discovery.zen.ping.unicast.hosts: ["hadoop1", "hadoop2", "hadoop3"]

#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

jvm.options
修改下
-Xms64m
-Xmx64m

Kibana的conf配置

kibana.yml

# Kibana is served by a back end server. This setting specifies the port to use.

server.port: 5601

# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
#server.host: "localhost"

server.host: "192.168.116.202"

# Enables you to specify a path to mount Kibana at if you are running behind a proxy. This only affects
# the URLs generated by Kibana, your proxy is expected to remove the basePath value before forwarding requests
# to Kibana. This setting cannot end in a slash.
#server.basePath: ""

# The maximum payload size in bytes for incoming server requests.
#server.maxPayloadBytes: 1048576

# The Kibana server's name.  This is used for display purposes.
#server.name: "your-hostname"

# The URL of the Elasticsearch instance to use for all your queries.
#elasticsearch.url: "http://localhost:9200"

elasticsearch.url: "http://192.168.116.201:9200"

运行Elasticsearch
cd /opt/software/elasticsearch
su elasticsearch
bin/elasticsearch &

运行Kibana
cd /opt/software/kibana/
bin/kibana &

10、kafka2es打包到集群执行

打包
使用maven工具点击install

放入集群
将打包完成的jar文件和xz_bigdata_spark-1.0-SNAPSHOT.jar 一起放入/usr/chl/spark7/目录下面

执行
spark-submit --master yarn-cluster --num-executors 1 --driver-memory 500m --executor-memory 1g --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ‘ ‘ ‘,’) --class com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar chl_test7 chl_test7

spark-submit
--master yarn-cluster //集群启动
--num-executors 1 //分配多少个进程
--driver-memory 500m //driver内存
--executor-memory 1g //进程内存
--executor-cores 1 //开多少个核,线程
--jars $(echo /usr/chl/spark8/jars/*.jar | tr ‘ ‘ ‘,’) //加载jar
--class com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar

11、运行截图

kafka2esstreaming截图

Elasticsearch各个节点状况

12、冲突查找快捷键

Ctrl+Alt+Shift+N

八、xz_bigdata_es开发

1、pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata2</artifactId>
        <groupId>com.hsiehchou</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_es</artifactId>

    <name>xz_bigdata_es</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>

    <dependencies>
        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_resources</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>transport</artifactId>
            <version>6.2.3</version>
        </dependency>

        <dependency>
            <groupId>io.searchbox</groupId>
            <artifactId>jest</artifactId>
            <version>6.3.1</version>
        </dependency>
    </dependencies>
</project>

2、admin

AdminUtil.java

package com.hsiehchou.es.admin;

import com.hsiehchou.common.file.FileCommon;
import com.hsiehchou.es.client.ESClientUtils;
import org.elasticsearch.action.admin.indices.create.CreateIndexResponse;
import org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class AdminUtil {
    private static Logger LOG = LoggerFactory.getLogger(AdminUtil.class);

    public static void main(String[] args) throws Exception{
        //创建索引核mapping
        AdminUtil.buildIndexAndTypes("tanslator_test1111","tanslator_test1111", "es/mapping/test.json",3,1);
        //index = 类型+日期
        //查找类  Ctrl+Shift+Alt+N
    }

    /**
     * @param index
     * @param type
     * @param path
     * @param shard
     * @param replication
     * @return
     * @throws Exception
     */
    public static boolean buildIndexAndTypes(String index,String type,String path,int shard,int replication) throws Exception{
        boolean flag ;
        TransportClient client = ESClientUtils.getClient();
        String mappingJson = FileCommon.getAbstractPath(path);

        boolean indices = AdminUtil.createIndices(client, index, shard, replication);
        if(indices){
            LOG.info("创建索引"+ index + "成功");
            flag = MappingUtil.addMapping(client, index, type, mappingJson);
        }
        else{
            LOG.error("创建索引"+ index + "失败");
            flag = false;
        }
        return flag;
    }

    /**
     * @desc 判断需要创建的index是否存在
     * */
    public static boolean indexExists(TransportClient client,String index){
        boolean ifExists = false;
        try {
            System.out.println("client===" + client);
            IndicesExistsResponse existsResponse = client.admin().indices().prepareExists(index).execute().actionGet();
            ifExists = existsResponse.isExists();
        } catch (Exception e) {
            e.printStackTrace();
            LOG.error("判断index是否存在失败...");
            return ifExists;
        }
        return ifExists;
    }

    /**
     * 创建索引
     * @param client
     * @param index
     * @param shard
     * @param replication
     * @return
     */
    public static boolean createIndices(TransportClient client, String index, int shard , int replication){

        if(!indexExists(client,index)) {
            LOG.info("该index不存在,创建...");
            CreateIndexResponse createIndexResponse =null;
            try {
                createIndexResponse = client.admin().indices().prepareCreate(index)
                        .setSettings(Settings.builder()
                                .put("index.number_of_shards", shard)
                                .put("index.number_of_replicas", replication)
                                .put("index.codec", "best_compression")
                                .put("refresh_interval", "30s"))
                        .execute().actionGet();
                return createIndexResponse.isAcknowledged();
            } catch (Exception e) {
                LOG.error(null, e);
                return false;
            }
        }
        LOG.warn("该index " + index + " 已经存在...");
        return false;
    }
}

MappingUtil.java

package com.hsiehchou.es.admin;

import com.alibaba.fastjson.JSON;
import org.elasticsearch.action.admin.indices.mapping.put.PutMappingRequest;
import org.elasticsearch.action.admin.indices.mapping.put.PutMappingResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentFactory;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;

public class MappingUtil {

    private static Logger LOG = LoggerFactory.getLogger(MappingUtil.class);
    //关闭自动添加字段,关闭后索引数据中如果有多余字段不会修改mapping,默认true
    private boolean dynamic = true;

    public static XContentBuilder buildMapping(String tableName) throws IOException {
        XContentBuilder builder = null;
        try {
            builder = XContentFactory.jsonBuilder().startObject()
                    .startObject(tableName)
                    .startObject("_source").field("enabled", true).endObject()
                    .startObject("properties")
                    .startObject("id").field("type", "long").endObject()
                    .startObject("sn").field("type", "text").endObject()
                    .endObject()  
                .endObject()  
                .endObject();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return builder;
    }

    public static boolean addMapping(TransportClient client, String index, String type, String jsonString){
        PutMappingResponse putMappingResponse = null;
        try {
            PutMappingRequest mappingRequest = new PutMappingRequest(index)
                    .type(type).source(JSON.parseObject(jsonString));
            putMappingResponse = client.admin().indices().putMapping(mappingRequest).actionGet();
        } catch (Exception e) {
            LOG.error(null,e);
            e.printStackTrace();
            LOG.error("添加" + type + "的mapping失败....",e);
            return false;
        }
        boolean success = putMappingResponse.isAcknowledged();
        if (success){
            LOG.info("创建" + type + "的mapping成功....");
            return success;
        }
        return success;
    }

    public static void main(String[] args) throws Exception {
        /*String singleConf = ConsulConfigUtil.getSingleConf("es6.1.0/mapping/http");
        int i = singleConf.length() / 2;
        System.out.println(i);*/
    }
}

3、client

ESClientUtils.java

package com.hsiehchou.es.client;

import com.hsiehchou.common.config.ConfigUtil;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.Serializable;
import java.net.InetAddress;
import java.util.Properties;

/**
 * ES 客户端获取
 */
public class ESClientUtils implements Serializable{

    private static Logger LOG = LoggerFactory.getLogger(ESClientUtils.class);
    private volatile static TransportClient esClusterClient;
    private ESClientUtils(){}
    private static Properties properties;
    static {
        properties = ConfigUtil.getInstance().getProperties("es/es_cluster.properties");
    }

    public static TransportClient getClient(){
        System.setProperty("es.set.netty.runtime.available.processors", "false");
        String clusterName = properties.getProperty("es.cluster.name");
        String clusterNodes1 = properties.getProperty("es.cluster.nodes1");
        String clusterNodes2 = properties.getProperty("es.cluster.nodes2");
        String clusterNodes3 = properties.getProperty("es.cluster.nodes3");
        LOG.info("clusterName:"+ clusterName);
        LOG.info("clusterNodes:"+ clusterNodes1);
        LOG.info("clusterNodes:"+ clusterNodes2);
        LOG.info("clusterNodes:"+ clusterNodes3);
        if(esClusterClient==null){
            synchronized (ESClientUtils.class){
                if(esClusterClient==null){
                    try{
                        Settings settings = Settings.builder()
                                .put("cluster.name", clusterName)
                                //.put("searchguard.ssl.transport.enabled", false)
                                //.put("xpack.security.user", "sc_xy_mn_es:xy@66812.com")
                               // .put("transport.type","netty3")
                               // .put("http.type","netty3")
                                .put("client.transport.sniff",true).build();//开启自动嗅探功能
                        esClusterClient = new PreBuiltTransportClient(settings)
                                .addTransportAddress(new TransportAddress(InetAddress.getByName(clusterNodes1), 9300))
                                .addTransportAddress(new TransportAddress(InetAddress.getByName(clusterNodes2), 9300))
                                .addTransportAddress(new TransportAddress(InetAddress.getByName(clusterNodes3), 9300));
                        LOG.info("esClusterClient========" + esClusterClient.listedNodes());
                    }catch (Exception e){
                        LOG.error("获取客户端失败",e);
                    }finally {

                    }
                }
            }
        }
        return esClusterClient;
    }

    public static void main(String[] args) {
        TransportClient client = ESClientUtils.getClient();
        System.out.println(client);
    }
}

4、jest/service

IndexTypeUtil.java

package com.hsiehchou.es.jest.service;

import com.hsiehchou.common.config.JsonReader;
import io.searchbox.client.JestClient;

public class IndexTypeUtil {

    public static void main(String[] args) {
        IndexTypeUtil.createIndexAndType("tanslator","es/mapping/tanslator.json");
       // IndexTypeUtil.createIndexAndType("task");
      //  IndexTypeUtil.createIndexAndType("ability");
       // IndexTypeUtil.createIndexAndType("paper");
    }

    public static void createIndexAndType(String index,String jsonPath){
        try{
            JestClient jestClient = JestService.getJestClient();
            JestService.createIndex(jestClient, index);
            JestService.createIndexMapping(jestClient,index,index,getSourceFromJson(jsonPath));
        }catch (Exception e){
            e.printStackTrace();
            //LOG.error("创建索引失败",e);
        }
    }
    public static String getSourceFromJson(String path){
        return JsonReader.readJson(path);
    }

    public static String getSource(String index){
        if(index.equals("task")){
            return "{\"_source\": {\n" +
                    "    \"enabled\": true\n" +
                    "  },\n" +
                    "  \"properties\": {\n" +
                    "    \"taskwordcount\": {\n" +
                    "      \"type\": \"long\"\n" +
                    "    },\n" +
                    "    \"taskprice\": {\n" +
                    "      \"type\": \"float\"\n" +
                    "    }\n" +
                    "  }\n" +
                    "}";
        }

        if(index.equals("tanslator")){
            return "{\n" +
                    "  \"_source\": {\n" +
                    "    \"enabled\": true\n" +
                    "  },\n" +
                    "  \"properties\": {\n" +
                    "    \"birthday\": {\n" +
                    "      \"type\": \"text\",\n" +
                    "      \"fields\": {\n" +
                    "        \"keyword\": {\n" +
                    "          \"ignore_above\": 256,\n" +
                    "          \"type\": \"keyword\"\n" +
                    "        }\n" +
                    "      }\n" +
                    "    },\n" +
                    "    \"createtime\":{\n" +
                    "      \"type\": \"long\"\n" +
                    "    },\n" +
                    "    \"updatetime\":{\n" +
                    "      \"type\": \"long\"\n" +
                    "    },\n" +
                    "    \"avgcooperation\":{\n" +
                    "      \"type\": \"long\"\n" +
                    "    },\n" +
                    "    \"cooperationwordcount\":{\n" +
                    "      \"type\": \"long\"\n" +
                    "    },\n" +
                    "    \"cooperation\":{\n" +
                    "      \"type\": \"long\"\n" +
                    "    },\n" +
                    "    \"cooperationtime\":{\n" +
                    "      \"type\": \"long\"\n" +
                    "    },\n" +
                    "    \"age\":{\n" +
                    "      \"type\": \"long\"\n" +
                    "    },\n" +
                    "    \"industry\": {\n" +
                    "      \"type\": \"nested\",\n" +
                    "      \"properties\": {\n" +
                    "        \"industryname\": {\n" +
                    "          \"type\": \"text\",\n" +
                    "          \"fields\": {\n" +
                    "            \"keyword\": {\n" +
                    "              \"ignore_above\": 256,\n" +
                    "              \"type\": \"keyword\"\n" +
                    "            }\n" +
                    "          }\n" +
                    "        },\n" +
                    "        \"count\": {\n" +
                    "          \"type\": \"long\"\n" +
                    "        },\n" +
                    "        \"industryid\": {\n" +
                    "          \"type\": \"text\",\n" +
                    "          \"fields\": {\n" +
                    "            \"keyword\": {\n" +
                    "              \"ignore_above\": 256,\n" +
                    "              \"type\": \"keyword\"\n" +
                    "            }\n" +
                    "          }\n" +
                    "        }\n" +
                    "      }\n" +
                    "    }\n" +
                    "\n" +
                    "  }\n" +
                    "}";
        }
        return "";
    }
}

JestService.java

package com.hsiehchou.es.jest.service;

import com.hsiehchou.common.file.FileCommon;
import com.google.gson.GsonBuilder;
import io.searchbox.action.Action;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.JestResult;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.*;
import io.searchbox.indices.CreateIndex;
import io.searchbox.indices.DeleteIndex;
import io.searchbox.indices.IndicesExists;
import io.searchbox.indices.mapping.GetMapping;
import io.searchbox.indices.mapping.PutMapping;
import org.apache.commons.lang.StringUtils;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.aggregations.AggregationBuilder;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.sort.SortOrder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.util.List;
import java.util.Map;

public class JestService {

    private static Logger LOG = LoggerFactory.getLogger(JestService.class);


    /**
     * 获取JestClient对象
     *
     * @return
     */
    public static JestClient getJestClient() {

        JestClientFactory factory = new JestClientFactory();
        factory.setHttpClientConfig(new HttpClientConfig
                .Builder("http://hadoop1:9200")
                //.defaultCredentials("sc_xy_mn_es","xy@66812.com")
                .gson(new GsonBuilder().setDateFormat("yyyy-MM-dd'T'hh:mm:ss").create())
                .connTimeout(1500)
                .readTimeout(3000)
                .multiThreaded(true)
                .build());
        return factory.getObject();
    }


    public static void main(String[] args) throws Exception {
        JestClient jestClient = null;
//        Map<String, Long> stringLongMap = null;
        List<Map<String, Object>> maps = null;
        try {
            jestClient = JestService.getJestClient();
           /* SearchResult aggregation = JestService.aggregation(jestClient,
                    "wechat",
                    "wechat",
                    "collect_time");
            stringLongMap = ResultParse.parseAggregation(aggregation);*/
           /* SearchResult search = search(jestClient,
                    "wechat",
                    "wechat",
                    "id",
                    "65a3d548bd3e42b1972191bc2bd2829b",
                    "collect_time",
                    "desc",
                    1,
                    2);*/
            /*SearchResult search = search(jestClient,
                    "",
                    "",
                    "phone_mac",
                    "aa-aa-aa-aa-aa-aa",
                    "collect_time",
                    "asc",
                    1,
                    1000);*/

//            System.out.println(indexExists(jestClient,"wechat"));
            System.out.println("wechat数据量:"+count(jestClient,"wechat","wechat"));
            System.out.println(aggregation(jestClient,"wechat","wechat", "phone"));

            String[] includes = new String[]{"latitude","longitude","collect_time"};
//            try{
            SearchResult search = JestService.search(jestClient,
                        "",
                        "",
                        "phone_mac.keyword",
                        "aa-aa-aa-aa-aa-aa",
                        "collect_time",
                        "asc",
                        1,
                        2000);
                maps = ResultParse.parseSearchResultOnly(search);
                System.out.println(maps.size());
                System.out.println(maps);
            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                JestService.closeJestClient(jestClient);
            }
        System.out.println(maps);
//        } catch (Exception e) {
//            e.printStackTrace();
//        }finally {
//            JestService.closeJestClient(jestClient);
//        }
//        System.out.println(stringLongMap);
    }


    /**
     * 统计一个索引所有数据
     * @param jestClient
     * @param indexName
     * @param typeName
     * @return
     * @throws Exception
     */

    public static Long count(JestClient jestClient,
                             String indexName,
                             String typeName) throws Exception {
        Count count = new Count.Builder()
                .addIndex(indexName)
                .addType(typeName)
                .build();
        CountResult results = jestClient.execute(count);

        return results.getCount().longValue();
    }


    /**
     * 聚合分组查询
     * @param jestClient
     * @param indexName
     * @param typeName
     * @param field
     * @return
     * @throws Exception
     */
    public static SearchResult  aggregation(JestClient jestClient, String indexName, String typeName, String field) throws Exception {

        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        //分组聚合API
        AggregationBuilder group1 = AggregationBuilders.terms("group1").field(field);
        //group1.subAggregation(AggregationBuilders.terms("group2").field(query));
        searchSourceBuilder.aggregation(group1);
        searchSourceBuilder.size(0);
        System.out.println(searchSourceBuilder.toString());
        Search search = new Search.Builder(searchSourceBuilder.toString())
                .addIndex(indexName)
                .addType(typeName).build();
        SearchResult result = jestClient.execute(search);
        return result;
    }


    //基础封装
    public static SearchResult search(
            JestClient jestClient,
            String indexName,
            String typeName,
            String field,
            String fieldValue,
            String sortField,
            String sortValue,
            int pageNumber,
            int pageSize,
            String[] includes) {
        //构造一个查询体  封装的就是查询语句
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        searchSourceBuilder.fetchSource(includes,new String[0]);

        //查询构造器
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        if(StringUtils.isEmpty(field)){
            boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.matchAllQuery());
        }else{
            boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.termQuery(field,fieldValue));
        }

        searchSourceBuilder.query(boolQueryBuilder);
        //定义分页
        //从什么时候开始
        searchSourceBuilder.from((pageNumber-1)*pageSize);
        searchSourceBuilder.size(pageSize);

        //设置排序
        if("desc".equals(sortValue)){
            searchSourceBuilder.sort(sortField,SortOrder.DESC);
        }else{
            searchSourceBuilder.sort(sortField,SortOrder.ASC);
        }

        System.out.println("sql =====" + searchSourceBuilder.toString());

        //构造一个查询执行器
        Search.Builder builder = new Search.Builder(searchSourceBuilder.toString());
        //设置indexName typeName
        if(StringUtils.isNotBlank(indexName)){
            builder.addIndex(indexName);
        }
        if(StringUtils.isNotBlank(typeName)){
            builder.addType(typeName);
        }

        Search build = builder.build();
        SearchResult searchResult = null;
        try {
            searchResult = jestClient.execute(build);
        } catch (IOException e) {
            LOG.error("查询失败",e);
        }
        return searchResult;
    }

    //基础封装
    public static SearchResult search(
            JestClient jestClient,
            String indexName,
            String typeName,
            String field,
            String fieldValue,
            String sortField,
            String sortValue,
            int pageNumber,
            int pageSize) {

        //构造一个查询体  封装的就是查询语句
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        //查询构造器
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        if(StringUtils.isEmpty(field)){
            boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.matchAllQuery());
        }else{
            boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.termQuery(field,fieldValue));
        }

        searchSourceBuilder.query(boolQueryBuilder);
        //定义分页
        //从什么时候开始
        searchSourceBuilder.from((pageNumber-1)*pageSize);
        searchSourceBuilder.size(pageSize);

        //设置排序
        if("desc".equals(sortValue)){
            searchSourceBuilder.sort(sortField,SortOrder.DESC);
        }else{
            searchSourceBuilder.sort(sortField,SortOrder.ASC);
        }

        System.out.println("sql =====" + searchSourceBuilder.toString());

        //构造一个查询执行器
        Search.Builder builder = new Search.Builder(searchSourceBuilder.toString());
        //设置indexName typeName
        if(StringUtils.isNotBlank(indexName)){
            builder.addIndex(indexName);
        }
        if(StringUtils.isNotBlank(typeName)){
            builder.addType(typeName);
        }

        Search build = builder.build();
        SearchResult searchResult = null;
        try {
            searchResult = jestClient.execute(build);
        } catch (IOException e) {
            LOG.error("查询失败",e);
        }
        return searchResult;
    }


   /* //基础封装
    public static SearchResult search(
            JestClient jestClient,
            String indexName,
            String typeName,
            String field,
            String fieldValue,
            String sortField,
            String sortValue,
            int pageNumber,
            int pageSize) {

        //构造一个查询体  封装的就是查询语句
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        //查询构造器
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        if(StringUtils.isEmpty(field)){
            boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.matchAllQuery());
        }else{
            boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.termQuery(field,fieldValue));
        }
        searchSourceBuilder.query(boolQueryBuilder);
        //定义分页
        //从什么时候开始
        searchSourceBuilder.from((pageNumber-1)*pageSize);
        searchSourceBuilder.size(pageSize);
        //设置排序
        if("desc".equals(sortValue)){
            searchSourceBuilder.sort(sortField,SortOrder.DESC);
        }else{
            searchSourceBuilder.sort(sortField,SortOrder.ASC);
        }


        System.out.println("sql =====" + searchSourceBuilder.toString());

        //构造一个查询执行器
        Search.Builder builder = new Search.Builder(searchSourceBuilder.toString());

        //设置indexName typeName
        if(StringUtils.isNotBlank(indexName)){
            builder.addIndex(indexName);
        }
        if(StringUtils.isNotBlank(typeName)){
            builder.addType(typeName);
        }

        Search build = builder.build();
        SearchResult searchResult = null;
        try {
            searchResult = jestClient.execute(build);
        } catch (IOException e) {
            LOG.error("查询失败",e);
        }
        return searchResult;
    }
*/

    /**
     * 判断索引是否存在
     *
     * @param jestClient
     * @param indexName
     * @return
     * @throws Exception
     */
    public static boolean indexExists(JestClient jestClient, String indexName) {
        JestResult result = null;
        try {
            Action action = new IndicesExists.Builder(indexName).build();
            result = jestClient.execute(action);
        } catch (IOException e) {
            LOG.error(null, e);
        }
        return result.isSucceeded();
    }


    /**
     * 创建索引
     *
     * @param jestClient
     * @param indexName
     * @return
     * @throws Exception
     */
    public static boolean createIndex(JestClient jestClient, String indexName) throws Exception {

        if (!JestService.indexExists(jestClient, indexName)) {
            JestResult jr = jestClient.execute(new CreateIndex.Builder(indexName).build());
            return jr.isSucceeded();
        } else {
            LOG.info("该索引已经存在");
            return false;
        }
    }

    public static boolean createIndexWithSettingsMapAndMappingsString(JestClient jestClient, String indexName, String type, String path) throws Exception {

        // String mappingJson = "{\"type1\": {\"_source\":{\"enabled\":false},\"properties\":{\"field1\":{\"type\":\"keyword\"}}}}";
        String mappingJson = FileCommon.getAbstractPath(path);
        String realMappingJson = "{" + type + ":" + mappingJson + "}";
        System.out.println(realMappingJson);
        CreateIndex createIndex = new CreateIndex.Builder(indexName)
                .mappings(realMappingJson)
                .build();
        JestResult jr = jestClient.execute(createIndex);
        return jr.isSucceeded();
    }


    /**
     * Put映射
     *
     * @param jestClient
     * @param indexName
     * @param typeName
     * @param source
     * @return
     * @throws Exception
     */
    public static boolean createIndexMapping(JestClient jestClient, String indexName, String typeName, String source) throws Exception {

        PutMapping putMapping = new PutMapping.Builder(indexName, typeName, source).build();
        JestResult jr = jestClient.execute(putMapping);
        return jr.isSucceeded();
    }

    /**
     * Get映射
     *
     * @param jestClient
     * @param indexName
     * @param typeName
     * @return
     * @throws Exception
     */
    public static String getIndexMapping(JestClient jestClient, String indexName, String typeName) throws Exception {

        GetMapping getMapping = new GetMapping.Builder().addIndex(indexName).addType(typeName).build();
        JestResult jr = jestClient.execute(getMapping);
        return jr.getJsonString();
    }

    /**
     * 索引文档
     *
     * @param jestClient
     * @param indexName
     * @param typeName
     * @return
     * @throws Exception
     */
    public static boolean index(JestClient jestClient, String indexName, String typeName, String idField, List<Map<String, Object>> listMaps) throws Exception {

        Bulk.Builder bulk = new Bulk.Builder().defaultIndex(indexName).defaultType(typeName);
        for (Map<String, Object> map : listMaps) {
            if (map != null && map.containsKey(idField)) {
                Object o = map.get(idField);
                Index index = new Index.Builder(map).id(map.get(idField).toString()).build();
                bulk.addAction(index);
            }
        }
        BulkResult br = jestClient.execute(bulk.build());
        return br.isSucceeded();
    }


    /**
     * 索引文档
     *
     * @param jestClient
     * @param indexName
     * @param typeName
     * @return
     * @throws Exception
     */
    public static boolean indexString(JestClient jestClient, String indexName, String typeName, String idField, List<Map<String, String>> listMaps) throws Exception {
        if (listMaps != null && listMaps.size() > 0) {
            Bulk.Builder bulk = new Bulk.Builder().defaultIndex(indexName).defaultType(typeName);
            for (Map<String, String> map : listMaps) {
                if (map != null && map.containsKey(idField)) {
                    Index index = new Index.Builder(map).id(map.get(idField)).build();
                    bulk.addAction(index);
                }
            }
            BulkResult br = jestClient.execute(bulk.build());
            return br.isSucceeded();
        } else {
            return false;
        }
    }

    /**
     * 索引文档
     *
     * @param jestClient
     * @param indexName
     * @param typeName
     * @return
     * @throws Exception
     */
    public static boolean indexOne(JestClient jestClient, String indexName, String typeName, String id, Map<String, Object> map) {
        Index.Builder builder = new Index.Builder(map);
        builder.id(id);
        builder.refresh(true);
        Index index = builder.index(indexName).type(typeName).build();
        try {
            JestResult result = jestClient.execute(index);
            if (result != null && !result.isSucceeded()) {
                throw new RuntimeException(result.getErrorMessage() + "插入更新索引失败!");
            }
        } catch (Exception e) {
            e.printStackTrace();
            return false;
        }
        return true;
    }


    /**
     * 搜索文档
     *
     * @param jestClient
     * @param indexName
     * @param typeName
     * @param query
     * @return
     * @throws Exception
     */
    public static SearchResult search(JestClient jestClient, String indexName, String typeName, String query) throws Exception {

        Search search = new Search.Builder(query)
                .addIndex(indexName)
                .addType(typeName)
                .build();
        return jestClient.execute(search);
    }

    /**
     * Get文档
     *
     * @param jestClient
     * @param indexName
     * @param typeName
     * @param id
     * @return
     * @throws Exception
     */
    public static JestResult get(JestClient jestClient, String indexName, String typeName, String id) throws Exception {

        Get get = new Get.Builder(indexName, id).type(typeName).build();
        return jestClient.execute(get);
    }

    /**
     * Delete索引
     *
     * @param jestClient
     * @param indexName
     * @return
     * @throws Exception
     */
    public boolean delete(JestClient jestClient, String indexName) throws Exception {

        JestResult jr = jestClient.execute(new DeleteIndex.Builder(indexName).build());
        return jr.isSucceeded();
    }

    /**
     * Delete文档
     *
     * @param jestClient
     * @param indexName
     * @param typeName
     * @param id
     * @return
     * @throws Exception
     */
    public static boolean delete(JestClient jestClient, String indexName, String typeName, String id) throws Exception {

        DocumentResult dr = jestClient.execute(new Delete.Builder(id).index(indexName).type(typeName).build());
        return dr.isSucceeded();
    }

    /**
     * 关闭JestClient客户端
     *
     * @param jestClient
     * @throws Exception
     */
    public static void closeJestClient(JestClient jestClient) {

        if (jestClient != null) {
            jestClient.shutdownClient();
        }
    }


    public static String query = "{\n" +
            "  \"size\": 1,\n" +
            "  \"query\": {\n" +
            "     \"match\": {\n" +
            "       \"taskexcuteid\": \"89899143\"\n" +
            "     }\n" +
            "  },\n" +
            "  \"aggs\": {\n" +
            "    \"count\": {\n" +
            "      \"terms\": {\n" +
            "        \"field\": \"source.keyword\"\n" +
            "      },\n" +
            "      \"aggs\": {\n" +
            "        \"sum_price\": {\n" +
            "          \"sum\": {\n" +
            "            \"field\": \"taskprice\"\n" +
            "          }\n" +
            "        },\n" +
            "        \"sum_wordcount\": {\n" +
            "          \"sum\": {\n" +
            "            \"field\": \"taskwordcount\"\n" +
            "          }\n" +
            "        },\n" +
            "        \"avg_taskprice\": {\n" +
            "          \"avg\": {\n" +
            "            \"field\": \"taskprice\"\n" +
            "          }\n" +
            "        }\n" +
            "      }\n" +
            "    }\n" +
            "  }\n" +
            "}";
}

ResultParse.java

package com.hsiehchou.es.jest.service;

import com.google.gson.Gson;
import com.google.gson.JsonElement;
import com.google.gson.JsonObject;
import com.google.gson.JsonPrimitive;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestResult;
import io.searchbox.core.SearchResult;
import io.searchbox.core.search.aggregation.MetricAggregation;
import io.searchbox.core.search.aggregation.TermsAggregation;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.*;

public class ResultParse {

    private static Logger LOG = LoggerFactory.getLogger(ResultParse.class);

    public static void main(String[] args) throws Exception {
        JestClient jestClient = JestService.getJestClient();

        /*long l = System.currentTimeMillis();
        JestClient jestClient = JestClientUtil.getJestClient();
        System.out.println(jestClient);
        String json ="{\n" +
                "  \"size\": 1, \n" +
                "  \"query\": {\n" +
                "    \"query_string\": {\n" +
                "      \"query\": \"中文\"\n" +
                "    }\n" +
                "  },\n" +
                "  \"highlight\": {\n" +
                "    \"pre_tags\" : [ \"<red>\" ],\n" +
                "    \"post_tags\" : [ \"</red>\" ],\n" +
                "    \"fields\":{\n" +
                "      \"secondlanguage\": {}\n" +
                "      ,\"firstlanguage\": {}\n" +
                "    }\n" +
                "  }\n" +
                "}";
        SearchResult search = JestService.search(jestClient, ES_INDEX.TANSLATOR_TEST, ES_INDEX.TANSLATOR_TEST,json);
        ResultParse.parseSearchResult(search);
        jestClient.shutdownClient();
        long l1 = System.currentTimeMillis();
        System.out.println(l1-l);*/
    }

    public static Map<String,Object> parseGet(JestResult getResult){
        Map<String,Object> map = null;
        JsonObject jsonObject = getResult.getJsonObject().getAsJsonObject("_source");
        if(jsonObject != null){
            map = new HashMap<String,Object>();
            //System.out.println(jsonObject);
            Set<Map.Entry<String, JsonElement>> entries = jsonObject.entrySet();
            for(Map.Entry<String, JsonElement> entry:entries){
                JsonElement value = entry.getValue();
                if(value.isJsonPrimitive()){
                    JsonPrimitive value1 = (JsonPrimitive) value;
                  //  LOG.error("转换前==========" + value1);
                    if( value1.isString() ){
                       // LOG.error("转换后==========" + value1.getAsString());
                        map.put(entry.getKey(),value1.getAsString());
                    }else{
                        map.put(entry.getKey(),value1);
                    }
                }else{
                    map.put(entry.getKey(),value);
                }
             }
        }
        return map;
    }

    public static Map<String,Object> parseGet2map(JestResult getResult){

        JsonObject source = getResult.getJsonObject().getAsJsonObject("_source");
        Gson gson = new Gson();
        Map map = gson.fromJson(source, Map.class);
        return map;
    }

    /**
     * 解析listMap
     * 结果格式为  {hits=0, total=0, data=[]}
     * @param search
     * @return
     */
    public static List<Map<String,Object>> parseSearchResultOnly(SearchResult search){

        List<Map<String,Object>> list = new ArrayList<Map<String,Object>>();
        List<SearchResult.Hit<Object, Void>> hits = search.getHits(Object.class);
        for(SearchResult.Hit<Object, Void> hit : hits){
            Map<String,Object> source = (Map<String,Object>)hit.source;
            list.add(source);
        }
        return list;
    }

    /**
     * 解析listMap
     * 结果格式为  {hits=0, total=0, data=[]}
     * @param search
     * @return
     */
    public static Map<String,Long> parseAggregation(SearchResult search){
        Map<String,Long> mapResult = new HashMap<>();
        MetricAggregation aggregations = search.getAggregations();
        TermsAggregation group1 = aggregations.getTermsAggregation("group1");
        List<TermsAggregation.Entry> buckets = group1.getBuckets();
        buckets.forEach(x->{
            String key = x.getKey();
            Long count = x.getCount();
            mapResult.put(key,count);
        });
        return mapResult;
    }

    /**
     * 解析listMap
     * 结果格式为  {hits=0, total=0, data=[]}
     * @param search
     * @return
     */
    public static Map<String,Object> parseSearchResult(SearchResult search){

        Map<String,Object> map = new HashMap<String,Object>();
        List<Map<String,Object>> list = new ArrayList<Map<String,Object>>();

        Long total = search.getTotal();
        map.put("total",total);
        List<SearchResult.Hit<Object, Void>> hits = search.getHits(Object.class);
        map.put("hits",hits.size());
        for(SearchResult.Hit<Object, Void> hit : hits){
            Map<String, List<String>> highlight = hit.highlight;
            Map<String,Object> source = (Map<String,Object>)hit.source;
            source.put("highlight",highlight);
            list.add(source);
        }
        map.put("data",list);
        return map;
    }
}

5、search

BuilderUtil.java

package com.hsiehchou.es.search;

import org.apache.commons.lang.StringUtils;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.client.transport.TransportClient;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class BuilderUtil {

    private static Logger LOG = LoggerFactory.getLogger(BuilderUtil.class);

    public static SearchRequestBuilder getSearchBuilder(TransportClient client, String index, String type){
        SearchRequestBuilder builder = null;
        try {
            if (StringUtils.isNotBlank(index)) {
                builder = client.prepareSearch(index.split(","));
            } else {
                builder = client.prepareSearch();
            }
            if (StringUtils.isNotBlank(type)) {
                builder.setTypes(type.split(","));
            }
        } catch (Exception e) {
            LOG.error(null, e);
        }
        return builder;
    }

    public static SearchRequestBuilder getSearchBuilder(TransportClient client, String[] indexs, String type){
        SearchRequestBuilder builder = null;
        try {
            if (indexs.length>0) {
                for(String index:indexs){
                    builder = client.prepareSearch(index);
                }
            } else {
                builder = client.prepareSearch();
            }
            if (StringUtils.isNotBlank(type)) {
                builder.setTypes(type);
            }
        } catch (Exception e) {
            LOG.error(null, e);
        }
        return builder;
    }
}

QueryUtil.java

package com.hsiehchou.es.search;

import com.hsiehchou.es.utils.UnicodeUtil;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.QueryStringQueryBuilder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.Map;

public class QueryUtil {

    private static Logger LOG = LoggerFactory.getLogger(QueryUtil.class);

    /**
     * EQ   等於
     * NEQ  不等於
     * GE   大于等于
     * GT   大于
     * LE   小于等于
     * LT   小于
     * RANGE 区间范围
     */
    public static enum OPREATOR {EQ, NEQ,WILDCARD, GE, LE, GT, LT, FUZZY, RANGE, IN, PREFIX}

    /**
     * @param paramMap
     * @return
     */
    public static BoolQueryBuilder getSearchParam(Map<OPREATOR, Map<String, Object>> paramMap) {

        BoolQueryBuilder qb = QueryBuilders.boolQuery();

        if (null != paramMap && !paramMap.isEmpty()) {

            for (Map.Entry<OPREATOR, Map<String, Object>> paramEntry : paramMap.entrySet()) {

                OPREATOR key = paramEntry.getKey();
                Map<String, Object> fieldMap = paramEntry.getValue();

                for (Map.Entry<String, Object> fieldEntry : fieldMap.entrySet()) {

                    String field = fieldEntry.getKey();
                    Object value = fieldEntry.getValue();

                    switch (key) {
                        case EQ:/**等於查詢 equale**/
                            qb.must(QueryBuilders.matchPhraseQuery(field, value).slop(0));
                            break;
                        case NEQ:/**不等於查詢 not equale**/
                            qb.mustNot(QueryBuilders.matchQuery(field, value));
                            break;
                        case GE: /**大于等于查詢  great than or equal to**/
                            qb.must(QueryBuilders.rangeQuery(field).gte(value));
                            break;
                        case LE: /**小于等于查詢 less than or equal to**/
                            qb.must(QueryBuilders.rangeQuery(field).lte(value));
                            break;
                        case GT: /**大于查詢**/
                            qb.must(QueryBuilders.rangeQuery(field).gt(value));
                            break;
                        case LT: /**小于查詢**/
                            qb.must(QueryBuilders.rangeQuery(field).lt(value));
                            break;
                        case FUZZY:
                            String text = String.valueOf(value);
                            if (!UnicodeUtil.hasChinese(text)) {
                                text = "*" + text + "*";
                            }
                            text = QueryParser.escape(text);
                            qb.must(new QueryStringQueryBuilder(text).field(field));
                            break;

                        case RANGE: /**区间查詢**/
                            String[] split = value.toString().split(",");
                            if(split.length==2){
                                qb.must(QueryBuilders.rangeQuery(field).from(Long.valueOf(split[0]))
                                        .to(Long.valueOf(split[1])));
                            }
                             /*  if (value instanceof Map) {
                                Map<String, Object> rangMap = (Map<String, Object>) value;
                                qb.must(QueryBuilders.rangeQuery(field).from(rangMap.get("ge"))
                                        .to(rangMap.get("le")));
                            }*/
                            break;

                        case PREFIX: /**前缀查詢**/
                            qb.must(QueryBuilders.prefixQuery(field, String.valueOf(value)));
                            break;

                        case IN:
                            qb.must(QueryBuilders.termsQuery(field, (Object[]) value));
                            break;

                        default:
                            qb.must(QueryBuilders.matchQuery(field, value));
                            break;
                    }
                }
            }
        }
        return qb;
    }
}

ResponseParse.java

package com.hsiehchou.es.search;

import org.elasticsearch.action.get.GetResponse;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.Map;

public class ResponseParse {

    private static Logger LOG = LoggerFactory.getLogger(BuilderUtil.class);

    public static Map<String, Object> parseGetResponse(GetResponse getResponse){
        Map<String, Object> source = null;
        try {
            source = getResponse.getSource();
        } catch (Exception e) {
            LOG.error(null,e);
        }
        return source;
    }
}

SearchUtil.java

package com.hsiehchou.es.search;

import com.hsiehchou.es.client.ESClientUtils;
import org.elasticsearch.action.get.GetRequestBuilder;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.MatchQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;

public class SearchUtil {

    private static Logger LOG = LoggerFactory.getLogger(SearchUtil.class);

    private static TransportClient client = ESClientUtils.getClient();

    public static void main(String[] args) {
        TransportClient client = ESClientUtils.getClient();
        List<Map<String, Object>> maps = searchSingleData(client, "wechat", "wechat", "phone_mac", "aa-aa-aa-aa-aa-aa");
        System.out.println(maps);
        /* long l = System.currentTimeMillis();
        searchSingleData("tanslator", "tanslator","4e1117d7-c434-48a7-9134-45f7c90f94ee_TR1100397895_2");
        System.out.println("消耗时间" + (System.currentTimeMillis() - l));

        long lll = System.currentTimeMillis();
        searchSingleData("tanslator", "tanslator","4e1117d7-c434-48a7-9134-45f7c90f94ee_TR1100397895_2");
        System.out.println("消耗时间" + (System.currentTimeMillis() - lll));

        long ll = System.currentTimeMillis();
        List<Map<String, Object>> maps = searchSingleData(client,"tanslator", "tanslator", "iolid", "TR1100397895");
        System.out.println("消耗时间" + (System.currentTimeMillis() - ll));
        System.out.println(maps);*/
    }

    /**
     * 查询单条数据
     * @param index  索引
     * @param type   表名
     * @param id     字段
     * @return
     */
    public static GetResponse searchSingleData(String index, String type, String id) {
        GetResponse response = null;
        try {
            GetRequestBuilder builder = null;
            builder = client.prepareGet(index, type, id);
            response = builder.execute().actionGet();
        } catch (Exception e) {
            LOG.error(null, e);
        }
        return response;
    }

    /**
     * @param index
     * @param type
     * @param field
     * @param value
     * @return
     */
    public static List<Map<String, Object>> searchSingleData(TransportClient client,String index, String type,String field, String value) {
        List<Map<String, Object>> result = new ArrayList<>();
        try {
            SearchRequestBuilder builder = BuilderUtil.getSearchBuilder(client,index,type);
            MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery(field, value);
            builder.setQuery(matchQueryBuilder).setExplain(false);
            SearchResponse searchResponse = builder.execute().actionGet();
            SearchHits hits = searchResponse.getHits();
            SearchHit[] searchHists = hits.getHits();
            for (SearchHit sh : searchHists) {
                result.add(sh.getSourceAsMap());
            }
        } catch (Exception e) {
            e.printStackTrace();
            LOG.error(null, e);
        }
        return result;
    }

    /**
     * 多条件查詢
     * @param index
     * @param type
     * @param paramMap 组合查询条件
     * @return
     */
    public static SearchResponse searchListData(String index, String type,
                                                Map<QueryUtil.OPREATOR,Map<String,Object>> paramMap) {

        SearchRequestBuilder builder = BuilderUtil.getSearchBuilder(client,index,type);
        builder.setQuery(QueryUtil.getSearchParam(paramMap)).setExplain(false);
        SearchResponse searchResponse = builder.get();

        return searchResponse;
    }

    /**
     * 多条件查詢
     * @param index
     * @param type
     * @param paramMap 组合查询条件
     * @return
     */
    public static SearchResponse searchListData1(String index, String type, Map<String,String> paramMap) {

        BoolQueryBuilder qb = QueryBuilders.boolQuery();
        qb.must(QueryBuilders.matchQuery("", ""));

        BoolQueryBuilder qb1 = QueryBuilders.boolQuery();
        qb1.should(QueryBuilders.matchQuery("",""));
        qb1.should(QueryBuilders.matchQuery("",""));

        qb.must(qb1);
        return null;
    }
}

6、utils

ESresultUtil.java

package com.hsiehchou.es.utils;

import org.apache.commons.lang.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.Map;

public class ESresultUtil {

    private static Logger LOG = LoggerFactory.getLogger(ESresultUtil.class);

    public static Long getLong(Map<String,Object> esMAp,String field){

        Long valueLong = 0L;
        if(esMAp!=null && esMAp.size()>0){
            if(esMAp.containsKey(field)){
                 Object value = esMAp.get(field);
                 if(value!=null && StringUtils.isNotBlank(value.toString())){
                     valueLong = Long.valueOf(value.toString());
                 }
            }
        }
        return valueLong;
    }
}

UnicodeUtil.java

package com.hsiehchou.es.utils;

import java.util.regex.Pattern;

public class UnicodeUtil {

    // 根据Unicode编码完美的判断中文汉字和符号
    private static boolean isChinese(char c) {
        Character.UnicodeBlock ub = Character.UnicodeBlock.of(c);
        if (ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS || ub == Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS
                || ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A || ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B
                || ub == Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION || ub == Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS
                || ub == Character.UnicodeBlock.GENERAL_PUNCTUATION) {
            return true;
        }
        return false;
    }

    // 完整的判断中文汉字和符号
    public static boolean isChinese(String strName) {
        char[] ch = strName.toCharArray();
        for (int i = 0; i < ch.length; i++) {
            char c = ch[i];
            if (isChinese(c)) {
                return true;
            }
        }
        return false;
    }

    // 完整的判断中文汉字和符号
    public static boolean hasChinese(String strName) {
        char[] ch = strName.toCharArray();
        for (int i = 0; i < ch.length; i++) {
            char c = ch[i];
            if (isChinese(c)) {
                return true;
            }
        }
        return false;
    }

    // 只能判断部分CJK字符(CJK统一汉字)
    public static boolean isChineseByREG(String str) {
        if (str == null) {
            return false;
        }
        Pattern pattern = Pattern.compile("[\\u4E00-\\u9FBF]+");
        return pattern.matcher(str.trim()).find();
    }

    // 只能判断部分CJK字符(CJK统一汉字)
    /*    public static boolean isChineseByName(String str) {
        if (str == null) {
            return false;
        }
        // 大小写不同:\\p 表示包含,\\P 表示不包含
        // \\p{Cn} 的意思为 Unicode 中未被定义字符的编码,\\P{Cn} 就表示 Unicode中已经被定义字符的编码
        String reg = "\\p{InCJK Unified Ideographs}&&\\P{Cn}";
        Pattern pattern = Pattern.compile(reg);
        return pattern.matcher(str.trim()).find();
    }*/

    public static void main(String[] args) {
        System.out.println(hasChinese("aa表aa"));
    }
}

7、V2

ElasticSearchService.java

package com.hsiehchou.es.V2;

import com.hsiehchou.es.client.ESClientUtils;
import org.apache.commons.collections.map.HashedMap;
import org.apache.commons.lang.StringUtils;
import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexResponse;
import org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsRequest;
import org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsResponse;
import org.elasticsearch.action.bulk.BulkRequestBuilder;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.aggregations.AggregationBuilder;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.terms.Terms;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.elasticsearch.search.sort.SortBuilder;
import org.elasticsearch.search.sort.SortOrder;

import java.util.*;

/**
 *  ES检索封装
 */
public class ElasticSearchService {

    private final static int MAX = 10000;

    private static TransportClient client = ESClientUtils.getClient();

    /**
     * 功能描述:新建索引
     * @param indexName 索引名
     */
    public void createIndex(String indexName) {
        client.admin().indices().create(new CreateIndexRequest(indexName))
                .actionGet();
    }

    /**
     * 功能描述:新建索引
     * @param index 索引名
     * @param type 类型
     */
    public void createIndex(String index, String type) {
        client.prepareIndex(index, type).setSource().get();
    }

    /**
     * 功能描述:删除索引
     * @param index 索引名
     */
    public void deleteIndex(String index) {
        if (indexExist(index)) {
            DeleteIndexResponse dResponse = client.admin().indices().prepareDelete(index)
                    .execute().actionGet();
            if (!dResponse.isAcknowledged()) {

            }
        } else {

        }
    }

    /**
     * 功能描述:验证索引是否存在
     * @param index 索引名
     */
    public boolean indexExist(String index) {
        IndicesExistsRequest inExistsRequest = new IndicesExistsRequest(index);
        IndicesExistsResponse inExistsResponse = client.admin().indices()
                .exists(inExistsRequest).actionGet();
        return inExistsResponse.isExists();
    }

    /**
     * 功能描述:插入数据
     * @param index 索引名
     * @param type 类型
     * @param json 数据
     */
    public void insertData(String index, String type, String json) {
       client.prepareIndex(index, type)
                .setSource(json)
                .get();
    }

    /**
     * 功能描述:插入数据
     * @param index 索引名
     * @param type 类型
     * @param _id 数据id
     * @param json 数据
     */
    public void insertData(String index, String type, String _id, String json) {
        client.prepareIndex(index, type).setId(_id)
                .setSource(json)
                .get();
    }

    /**
     * 功能描述:更新数据
     * @param index 索引名
     * @param type 类型
     * @param _id 数据id
     * @param json 数据
     */
    public void updateData(String index, String type, String _id, String json) throws Exception {
        try {
            UpdateRequest updateRequest = new UpdateRequest(index, type, _id)
                    .doc(json);
            client.update(updateRequest).get();
        } catch (Exception e) {
            //throw new MessageException("update data failed.", e);
        }
    }

    /**
     * 功能描述:删除数据
     * @param index 索引名
     * @param type 类型
     * @param _id 数据id
     */
    public void deleteData(String index, String type, String _id) {
        client.prepareDelete(index, type, _id)
                .get();
    }

    /**
     * 功能描述:批量插入数据
     * @param index 索引名
     * @param type 类型
     * @param data (_id 主键, json 数据)
     */
    public void bulkInsertData(String index, String type, Map<String, String> data) {
        BulkRequestBuilder bulkRequest = client.prepareBulk();
        data.forEach((param1, param2) -> {
            bulkRequest.add(client.prepareIndex(index, type, param1)
                    .setSource(param2)
            );
        });
        bulkRequest.get();
    }

    /**
     * 功能描述:批量插入数据
     * @param index 索引名
     * @param type 类型
     * @param jsonList 批量数据
     */
    public void bulkInsertData(String index, String type, List<String> jsonList) {
        BulkRequestBuilder bulkRequest = client.prepareBulk();
        jsonList.forEach(item -> {
            bulkRequest.add(client.prepareIndex(index, type)
                    .setSource(item)
            );
        });
        bulkRequest.get();
    }

    /**
     * 功能描述:查询
     * @param index 索引名
     * @param type 类型
     * @param constructor 查询构造
     */
    public List<Map<String, Object>> search(String index, String type, ESQueryBuilderConstructor constructor) {

        List<Map<String, Object>> list = new ArrayList<>();
        SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index).setTypes(type);
        //排序
        if (StringUtils.isNotEmpty(constructor.getAsc()))
            searchRequestBuilder.addSort(constructor.getAsc(), SortOrder.ASC);
        if (StringUtils.isNotEmpty(constructor.getDesc()))
            searchRequestBuilder.addSort(constructor.getDesc(), SortOrder.DESC);
        //设置查询体
        searchRequestBuilder.setQuery(constructor.listBuilders());
        //返回条目数
        int size = constructor.getSize();
        if (size < 0) {
            size = 0;
        }
        if (size > MAX) {
            size = MAX;
        }
        //返回条目数
        searchRequestBuilder.setSize(size);
        searchRequestBuilder.setFrom(constructor.getFrom() < 0 ? 0 : constructor.getFrom());
        SearchResponse searchResponse = searchRequestBuilder.execute().actionGet();
        SearchHits hits = searchResponse.getHits();
        SearchHit[] searchHists = hits.getHits();
        for (SearchHit sh : searchHists) {
            list.add(sh.getSourceAsMap());
        }
        return list;
    }


    /**
     * 功能描述:查询
     * @param index 索引名
     * @param type 类型
     * @param constructor 查询构造
     */
    public Map<String,Object> searchCountAndMessage(String index, String type, ESQueryBuilderConstructor constructor) {
        Map<String,Object> map = new HashMap<String,Object>();
        List<Map<String, Object>> list = new ArrayList<>();
        SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index).setTypes(type);
        //排序
        if (StringUtils.isNotEmpty(constructor.getAsc()))
            searchRequestBuilder.addSort(constructor.getAsc(), SortOrder.ASC);
        if (StringUtils.isNotEmpty(constructor.getDesc()))
            searchRequestBuilder.addSort(constructor.getDesc(), SortOrder.DESC);
        //设置查询体
        searchRequestBuilder.setQuery(constructor.listBuilders());
        //返回条目数
        int size = constructor.getSize();
        if (size < 0) {
            size = 0;
        }
        if (size > MAX) {
            size = MAX;
        }

        //返回条目数
        searchRequestBuilder.setSize(size);
        searchRequestBuilder.setFrom(constructor.getFrom() < 0 ? 0 : constructor.getFrom());
        SearchResponse searchResponse = searchRequestBuilder.execute().actionGet();
        long totalHits = searchResponse.getHits().getTotalHits();

        SearchHits hits = searchResponse.getHits();
        SearchHit[] searchHists = hits.getHits();
        for (SearchHit sh : searchHists) {
            list.add(sh.getSourceAsMap());
        }
        map.put("total",(long)searchHists.length);
        map.put("count",totalHits);
        map.put("data",list);
        return map;
    }

    /**
     * 功能描述:查询
     * @param index 索引名
     * @param type 类型
     * @param constructor 查询构造
     */
    public Map<String,Object> searchCountAndMessageNew(String index, String type, ESQueryBuilderConstructorNew constructor) {
        Map<String,Object> map = new HashMap<String,Object>();
        List<Map<String, Object>> list = new ArrayList<>();
        SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index).setTypes(type);

        //排序
        List<SortBuilder> sortBuilderList = constructor.getSortBuilderList();
        if(sortBuilderList!=null && sortBuilderList.size()>0){
            sortBuilderList.forEach(sortBuilder->{
                searchRequestBuilder.addSort(sortBuilder);
            });
        }

        //设置查询体
        searchRequestBuilder.setQuery(constructor.listBuilders());

        //返回条目数
        int size = constructor.getSize();
        if (size < 0) {
            size = 0;
        }
        if (size > MAX) {
            size = MAX;
        }
        //返回条目数
        searchRequestBuilder.setSize(size);
        searchRequestBuilder.setFrom(constructor.getFrom() < 0 ? 0 : constructor.getFrom());

        //设置高亮
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        List<String> highLighterFields = constructor.getHighLighterFields();
        if(highLighterFields.size()>0){
            highLighterFields.forEach(field -> {
                highlightBuilder.field(field);
            });

        }

        highlightBuilder.preTags("<font color=\"red\">");
        highlightBuilder.postTags("</font>");
        SearchResponse searchResponse = searchRequestBuilder.highlighter(highlightBuilder).execute().actionGet();
        long totalHits = searchResponse.getHits().getTotalHits();

        SearchHits hits = searchResponse.getHits();
        SearchHit[] searchHists = hits.getHits();
        for (SearchHit hit : searchHists) {

            Map<String, Object> sourceAsMap = hit.getSourceAsMap();
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();

            //获取高亮结果
            Set<String> set = highlightFields.keySet();

            for (String str : set) {
                Text[] fragments = highlightFields.get(str).getFragments();
                String st1r="";
                for(Text text:fragments){
                    st1r = st1r + text.toString();
                }
                sourceAsMap.put(str,st1r);
                System.out.println("str(==============" + st1r);
            }

            list.add(sourceAsMap);
        }
        map.put("total",(long)searchHists.length);
        map.put("count",totalHits);
        map.put("data",list);
        return map;
    }

    /**
     * 功能描述:统计查询
     * @param index 索引名
     * @param type 类型
     * @param constructor 查询构造
     * @param groupBy 统计字段
     */
    public Map<Object, Object> statSearch(String index, String type, ESQueryBuilderConstructor constructor, String groupBy) {
        Map<Object, Object> map = new HashedMap();
        SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index).setTypes(type);
        //排序
        if (StringUtils.isNotEmpty(constructor.getAsc()))
            searchRequestBuilder.addSort(constructor.getAsc(), SortOrder.ASC);
        if (StringUtils.isNotEmpty(constructor.getDesc()))
            searchRequestBuilder.addSort(constructor.getDesc(), SortOrder.DESC);
        //设置查询体
        if (null != constructor) {
            searchRequestBuilder.setQuery(constructor.listBuilders());
        } else {
            searchRequestBuilder.setQuery(QueryBuilders.matchAllQuery());
        }
        int size = constructor.getSize();
        if (size < 0) {
            size = 0;
        }
        if (size > MAX) {
            size = MAX;
        }
        //返回条目数
        searchRequestBuilder.setSize(size);

        searchRequestBuilder.setFrom(constructor.getFrom() < 0 ? 0 : constructor.getFrom());
        SearchResponse sr = searchRequestBuilder.addAggregation(
                AggregationBuilders.terms("agg").field(groupBy)
        ).get();

        Terms stateAgg = sr.getAggregations().get("agg");

        Iterator<? extends Terms.Bucket> iter = stateAgg.getBuckets().iterator();

        while (iter.hasNext()) {
            Terms.Bucket gradeBucket = iter.next();
            map.put(gradeBucket.getKey(), gradeBucket.getDocCount());
        }

        return map;
    }

    /**
     * 功能描述:统计查询
     * @param index 索引名
     * @param type 类型
     * @param constructor 查询构造
     * @param agg 自定义计算
     */
    public Map<Object, Object> statSearch(String index, String type, ESQueryBuilderConstructor constructor, AggregationBuilder agg) {
        if (agg == null) {
            return null;
        }
        Map<Object, Object> map = new HashedMap();
        SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index).setTypes(type);
        //排序
        if (StringUtils.isNotEmpty(constructor.getAsc()))
            searchRequestBuilder.addSort(constructor.getAsc(), SortOrder.ASC);
        if (StringUtils.isNotEmpty(constructor.getDesc()))
            searchRequestBuilder.addSort(constructor.getDesc(), SortOrder.DESC);
        //设置查询体
        if (null != constructor) {
            searchRequestBuilder.setQuery(constructor.listBuilders());
        } else {
            searchRequestBuilder.setQuery(QueryBuilders.matchAllQuery());
        }
        int size = constructor.getSize();
        if (size < 0) {
            size = 0;
        }
        if (size > MAX) {
            size = MAX;
        }
        //返回条目数
        searchRequestBuilder.setSize(size);

        searchRequestBuilder.setFrom(constructor.getFrom() < 0 ? 0 : constructor.getFrom());
        SearchResponse sr = searchRequestBuilder.addAggregation(
                agg
        ).get();

        Terms stateAgg = sr.getAggregations().get("agg");
        Iterator<? extends Terms.Bucket> iter = stateAgg.getBuckets().iterator();

        while (iter.hasNext()) {
            Terms.Bucket gradeBucket = iter.next();
            map.put(gradeBucket.getKey(), gradeBucket.getDocCount());
        }
        return map;
    }

    /**
     * 功能描述:关闭链接
     */
    public void close() {
        client.close();
    }

    public static void test() {
        try{
            ElasticSearchService service = new ElasticSearchService();
            ESQueryBuilderConstructorNew constructor = new ESQueryBuilderConstructorNew();
            constructor.must(new ESQueryBuilders().bool(QueryBuilders.boolQuery()));
            constructor.must(new ESQueryBuilders().match("secondlanguage", "4"));
            constructor.must(new ESQueryBuilders().match("secondlanguage", "4"));
            constructor.should(new ESQueryBuilders().match("source", "5"));
            constructor.should(new ESQueryBuilders().match("source", "5"));
            service.searchCountAndMessageNew("", "", constructor);
        }catch (Exception e){
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        try {
            ElasticSearchService service = new ElasticSearchService();
            ESQueryBuilderConstructor constructor = new ESQueryBuilderConstructor();

         /*   constructor.must(new ESQueryBuilders().term("gender", "f").range("age", 20, 50));

            constructor.should(new ESQueryBuilders().term("gender", "f").range("age", 20, 50).fuzzy("age", 20));
            constructor.mustNot(new ESQueryBuilders().term("gender", "m"));
            constructor.setSize(15);  //查询返回条数,最大 10000
            constructor.setFrom(11);  //分页查询条目起始位置, 默认0
            constructor.setAsc("age"); //排序

            List<Map<String, Object>> list = service.search("bank", "account", constructor);
            Map<Object, Object> map = service.statSearch("bank", "account", constructor, "state");*/

            constructor.must(new ESQueryBuilders().match("id", "WE16000190TR"));
            List<Map<String, Object>> list = service.search("test01", "test01", constructor);
             for(Map<String, Object> map : list){
                 System.out.println(map);
             }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

ESCriterion.java

package com.hsiehchou.es.V2;

import org.elasticsearch.index.query.QueryBuilder;

import java.util.List;

/**
 * 条件接口
 */
public interface ESCriterion {

    public enum Operator {
        PREFIX,             /**根据字段前缀查询**/
        MATCH,              /**匹配查询**/
        MATCH_PHRASE,       /**精确匹配**/
        MULTI_MATCH,        /**多字段匹配**/

        TERM,               /**term查询**/
        TERMS,              /**term查询**/

        RANGE,              /**范围查询**/
        GTE,                 /**大于等于查询**/
        LTE,

        FUZZY,              /**根据字段前缀查询**/
        QUERY_STRING,       /**根据字段前缀查询**/
        MISSING ,           /**根据字段前缀查询**/

        BOOL
    }

    public enum MatchMode {
        START, END, ANYWHERE
    }

    public enum Projection {
        MAX, MIN, AVG, LENGTH, SUM, COUNT
    }

    public List<QueryBuilder> listBuilders();
}

ESQueryBuilderConstructor.java

package com.hsiehchou.es.V2;

import org.apache.commons.collections.CollectionUtils;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;

import java.util.ArrayList;
import java.util.List;

/**
 * 查询条件容器
 */
public class ESQueryBuilderConstructor {

    private int size = Integer.MAX_VALUE;

    private int from = 0;

    private String asc;

    private String desc;

    //查询条件容器
    private List<ESCriterion> mustCriterions = new ArrayList<ESCriterion>();
    private List<ESCriterion> shouldCriterions = new ArrayList<ESCriterion>();
    private List<ESCriterion> mustNotCriterions = new ArrayList<ESCriterion>();

    //构造builder
    public QueryBuilder listBuilders() {
        int count = mustCriterions.size() + shouldCriterions.size() + mustNotCriterions.size();
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        QueryBuilder queryBuilder = null;

        if (count >= 1) {
            //must容器
            if (!CollectionUtils.isEmpty(mustCriterions)) {
                for (ESCriterion criterion : mustCriterions) {
                    for (QueryBuilder builder : criterion.listBuilders()) {
                        queryBuilder = boolQueryBuilder.must(builder);
                    }
                }
            }

            //should容器
            if (!CollectionUtils.isEmpty(shouldCriterions)) {
                for (ESCriterion criterion : shouldCriterions) {
                    for (QueryBuilder builder : criterion.listBuilders()) {
                        queryBuilder = boolQueryBuilder.should(builder);
                    }
                }
            }

            //must not 容器
            if (!CollectionUtils.isEmpty(mustNotCriterions)) {
                for (ESCriterion criterion : mustNotCriterions) {
                    for (QueryBuilder builder : criterion.listBuilders()) {
                        queryBuilder = boolQueryBuilder.mustNot(builder);
                    }
                }
            }
            return queryBuilder;
        } else {
            return null;
        }
    }

    /**
     * 增加简单条件表达式
     */
    public ESQueryBuilderConstructor must(ESCriterion criterion){
        if(criterion!=null){
            mustCriterions.add(criterion);
        }
        return this;
    }

    /**
     * 增加简单条件表达式
     */
    public ESQueryBuilderConstructor should(ESCriterion criterion){
        if(criterion!=null){
            shouldCriterions.add(criterion);
        }
        return this;
    }

    /**
     * 增加简单条件表达式
     */
    public ESQueryBuilderConstructor mustNot(ESCriterion criterion){
        if(criterion!=null){
            mustNotCriterions.add(criterion);
        }
        return this;
    }


    public int getSize() {
        return size;
    }

    public void setSize(int size) {
        this.size = size;
    }

    public String getAsc() {
        return asc;
    }

    public void setAsc(String asc) {
        this.asc = asc;
    }

    public String getDesc() {
        return desc;
    }

    public void setDesc(String desc) {
        this.desc = desc;
    }

    public int getFrom() {
        return from;
    }

    public void setFrom(int from) {
        this.from = from;
    }
}

ESQueryBuilderConstructorNew.java

package com.hsiehchou.es.V2;

import org.apache.commons.collections.CollectionUtils;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.sort.SortBuilder;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;

/**
 * 查询条件容器
 */
public class ESQueryBuilderConstructorNew {

    private List<String> highLighterFields = new ArrayList<String>();

    private int size = Integer.MAX_VALUE;

    private int from = 0;

    private List<SortBuilder> sortBuilderList;

    public List<SortBuilder> getSortBuilderList() {
        return sortBuilderList;
    }

    public void setSortBuilderList(List<SortBuilder> sortBuilderList) {
        this.sortBuilderList = sortBuilderList;
    }

    private Map<String,List<String>> sortMap;

    //查询条件容器
    private List<ESCriterion> mustCriterions = new ArrayList<ESCriterion>();
    private List<ESCriterion> shouldCriterions = new ArrayList<ESCriterion>();
    private List<ESCriterion> mustNotCriterions = new ArrayList<ESCriterion>();

    //构造builder
    public QueryBuilder listBuilders() {
        int count = mustCriterions.size() + shouldCriterions.size() + mustNotCriterions.size();

        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        QueryBuilder queryBuilder = null;

        if (count >= 1) {
            //must容器
            if (!CollectionUtils.isEmpty(mustCriterions)) {
                for (ESCriterion criterion : mustCriterions) {
                    for (QueryBuilder builder : criterion.listBuilders()) {
                        queryBuilder = boolQueryBuilder.must(builder);
                    }
                }
            }

            //should容器
            if (!CollectionUtils.isEmpty(shouldCriterions)) {
                for (ESCriterion criterion : shouldCriterions) {
                    for (QueryBuilder builder : criterion.listBuilders()) {
                        queryBuilder = boolQueryBuilder.should(builder);
                    }
                }
            }

            //must not 容器
            if (!CollectionUtils.isEmpty(mustNotCriterions)) {
                for (ESCriterion criterion : mustNotCriterions) {
                    for (QueryBuilder builder : criterion.listBuilders()) {
                        queryBuilder = boolQueryBuilder.mustNot(builder);
                    }
                }
            }
            return queryBuilder;
        } else {
            return null;
        }
    }

    /**
     * 增加简单条件表达式
     */
    public ESQueryBuilderConstructorNew must(ESCriterion criterion){
        if(criterion!=null){
            mustCriterions.add(criterion);
        }
        return this;
    }

    /**
     * 增加简单条件表达式
     */
    public ESQueryBuilderConstructorNew should(ESCriterion criterion){
        if(criterion!=null){
            shouldCriterions.add(criterion);
        }
        return this;
    }
    /**
     * 增加简单条件表达式
     */
    public ESQueryBuilderConstructorNew mustNot(ESCriterion criterion){
        if(criterion!=null){
            mustNotCriterions.add(criterion);
        }
        return this;
    }

    public List<String> getHighLighterFields() {
        return highLighterFields;
    }

    public void setHighLighterFields(List<String> highLighterFields) {
        this.highLighterFields = highLighterFields;
    }

    public int getSize() {
        return size;
    }

    public void setSize(int size) {
        this.size = size;
    }

    public Map<String, List<String>> getSortMap() {
        return sortMap;
    }

    public void setSortMap(Map<String, List<String>> sortMap) {
        this.sortMap = sortMap;
    }

    public int getFrom() {
        return from;
    }

    public void setFrom(int from) {
        this.from = from;
    }
}

ESQueryBuilders.java

package com.hsiehchou.es.V2;

import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.NestedQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;

import java.util.ArrayList;
import java.util.Collection;
import java.util.List;

/**
 * 条件构造器
 */
public class ESQueryBuilders implements ESCriterion{

    private List<QueryBuilder> list = new ArrayList<QueryBuilder>();

    /**
     * 功能描述:match 查询
     * @param field 字段名
     * @param value 值
     */
    public ESQueryBuilders match(String field, Object value) {
        list.add(new ESSimpleExpression (field, value, Operator.MATCH).toBuilder());
        return this;
    }

    /**
     * 功能描述:match 查询
     * @param field 字段名
     * @param value 值
     */
    public ESQueryBuilders match_phrase(String field, Object value) {
        list.add(new ESSimpleExpression (field, value, Operator.MATCH_PHRASE).toBuilder());
        return this;
    }

    /**
     * 功能描述:match 查询
     * @param fieldNames 字段名
     * @param value 值
     */
    public ESQueryBuilders multi_match(Object value , String... fieldNames ) {
        String[] fields = fieldNames;
        list.add(new ESSimpleExpression (value, Operator.MULTI_MATCH,fields).toBuilder());
        return this;
    }

    /**
     * 功能描述:Term 查询
     * @param field 字段名
     * @param value 值
     */
    public ESQueryBuilders term(String field, Object value) {
        list.add(new ESSimpleExpression (field, value, Operator.TERM).toBuilder());
        return this;
    }

    /**
     * 功能描述:Terms 查询
     * @param field 字段名
     * @param values 集合值
     */
    public ESQueryBuilders terms(String field, Collection<Object> values) {
        list.add(new ESSimpleExpression (field, values).toBuilder());
        return this;
    }

    /**
     * 功能描述:fuzzy 查询
     * @param field 字段名
     * @param value 值
     */
    public ESQueryBuilders fuzzy(String field, Object value) {
        list.add(new ESSimpleExpression (field, value, Operator.FUZZY).toBuilder());
        return this;
    }

    /**
     * 功能描述:Range 查询
     * @param from 起始值
     * @param to 末尾值
     */
    public ESQueryBuilders range(String field, Object from, Object to) {
        list.add(new ESSimpleExpression (field, from, to).toBuilder());
        return this;
    }

    /**
     * 功能描述:GTE 大于等于查询
     * @param
     */
    public ESQueryBuilders gte(String field, Object num) {
        list.add(new ESSimpleExpression (field, num,Operator.GTE).toBuilder());
        return this;
    }

    /**
     * 功能描述:LTE 小于等于查询
     * @param
     */
    public ESQueryBuilders lte(String field, Object num) {
        list.add(new ESSimpleExpression (field, num,Operator.LTE).toBuilder());
        return this;
    }

    /**
     * 功能描述:prefix 查询
     * @param field 字段名
     * @param value 值
     */
    public ESQueryBuilders prefix(String field, Object value) {
        list.add(new ESSimpleExpression (field, value, Operator.PREFIX).toBuilder());
        return this;
    }

    /**
     * 功能描述:Range 查询
     * @param queryString 查询语句
     */
    public ESQueryBuilders queryString(String queryString) {
        list.add(new ESSimpleExpression (queryString, Operator.QUERY_STRING).toBuilder());
        return this;
    }

    /**
     * 功能描述:Range 查询
     * @param
     */
    public ESQueryBuilders bool(BoolQueryBuilder boolQueryBuilder) {
        list.add(boolQueryBuilder);
        return this;
    }

    public ESQueryBuilders nested(NestedQueryBuilder nestedQueryBuilder) {
        list.add(nestedQueryBuilder);
        return this;
    }

    public List<QueryBuilder> listBuilders() {
        return list;
    }
}

ESSimpleExpression.java

package com.hsiehchou.es.V2;

import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;

import java.util.Collection;
import com.hsiehchou.es.V2.ESCriterion.Operator;

import static org.elasticsearch.index.search.MatchQuery.Type.PHRASE;

/**
 * 条件表达式
 */
public class ESSimpleExpression {

    private String[] fieldNames;         //属性名
    private String fieldName;         //属性名
    private Object value;             //对应值
    private Collection<Object> values;//对应值
    private Operator operator;        //计算符
    private Object from;
    private Object to;

    protected  ESSimpleExpression() {
    }

    protected  ESSimpleExpression(Object value, Operator operator,String... fieldNames) {
        this.fieldNames = fieldNames;
        this.value = value;
        this.operator = operator;
    }


    protected  ESSimpleExpression(String fieldName, Object value, Operator operator) {
        this.fieldName = fieldName;
        this.value = value;
        this.operator = operator;
    }

    protected  ESSimpleExpression(String value, Operator operator) {
        this.value = value;
        this.operator = operator;
    }

    protected ESSimpleExpression(String fieldName, Collection<Object> values) {
        this.fieldName = fieldName;
        this.values = values;
        this.operator = Operator.TERMS;
    }

    protected ESSimpleExpression(String fieldName, Object from, Object to) {
        this.fieldName = fieldName;
        this.from = from;
        this.to = to;
        this.operator = Operator.RANGE;
    }

    public BoolQueryBuilder toBoolQueryBuilder(){
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        boolQueryBuilder.mustNot(QueryBuilders.matchQuery("",""));
        boolQueryBuilder.mustNot(QueryBuilders.matchQuery("",""));

        return null;
    }

    public QueryBuilder toBuilder() {
        QueryBuilder qb = null;
        switch (operator) {
            case MATCH:
                qb = QueryBuilders.matchQuery(fieldName, value);
                break;
            case MATCH_PHRASE:
                qb = QueryBuilders.matchPhraseQuery(fieldName, value);
                break;
            case MULTI_MATCH:
                qb = QueryBuilders.multiMatchQuery(value,fieldNames).type(PHRASE);
                break;
            case TERM:
                qb = QueryBuilders.termQuery(fieldName, value);
                break;
            case TERMS:
                qb = QueryBuilders.termsQuery(fieldName, values);
                break;
            case RANGE:
                qb = QueryBuilders.rangeQuery(fieldName).from(from).to(to).includeLower(true).includeUpper(true);
                break;
            case GTE:
                qb = QueryBuilders.rangeQuery(fieldName).gte(value);
                break;
            case LTE:
                qb = QueryBuilders.rangeQuery(fieldName).lte(value);
                break;
            case FUZZY:
                qb = QueryBuilders.fuzzyQuery(fieldName, value);
                break;
            case PREFIX:
                qb = QueryBuilders.prefixQuery(fieldName, value.toString());
                break;
            case QUERY_STRING:
                qb = QueryBuilders.queryStringQuery(value.toString());
                default:
        }
        return qb;
    }
}

九、预警

通过后台或者界面设置规则,保存到mysql,然后同步到redis。

数据量大的话,用mysql是非常慢的,使用内存数据库redis进行规则缓存,使用时直接比对预警。

预警流程

预警过程

MySQL 需要2张表
一张是规则表 用来存储规则
一张是消息表 存储告警消息

1、创建规则表(由界面控制规则发布)

规则首先存放在mysql中,会使用一个定时任务将mysql中的规则同步到redis
直接在test库中创建
创建脚本
xz_rule.sql

SET FOREIGN_KEY_CHECKS=0;

DROP TABLE IF EXISTS `xz_rule`;
CREATE TABLE `xz_rule` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `warn_fieldname` varchar(20) DEFAULT NULL,
  `warn_fieldvalue` varchar(255) DEFAULT NULL,
  `publisher` varchar(255) DEFAULT NULL,
  `send_type` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `send_mobile` varchar(255) DEFAULT NULL,
  `send_mail` varchar(255) DEFAULT NULL,
  `send_dingding` varchar(255) DEFAULT NULL,
  `create_time` date DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=latin1;

INSERT INTO `xz_rule` VALUES ('1', 'phone', '18609765432', '?????1', '2', '13724536789', '1782324@qq.com', '32143243', '2019-06-28');

2、创建消息表

  1. 用于存放预警的消息,供界面定时刷新预警消息 或者是滚屏预警
  2. 预警消息统计

warn_message.sql

SET FOREIGN_KEY_CHECKS=0;

DROP TABLE IF EXISTS `warn_message`;
CREATE TABLE `warn_message` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `alarmRuleid` varchar(255) DEFAULT NULL,
  `alarmType` varchar(255) DEFAULT NULL,
  `sendType` varchar(255) DEFAULT NULL,
  `sendMobile` varchar(255) DEFAULT NULL,
  `sendEmail` varchar(255) DEFAULT NULL,
  `sendStatus` varchar(255) DEFAULT NULL,
  `senfInfo` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `hitTime` datetime DEFAULT NULL,
  `checkinTime` datetime DEFAULT NULL,
  `isRead` varchar(255) DEFAULT NULL,
  `readAccounts` varchar(255) DEFAULT NULL,
  `alarmaccounts` varchar(255) DEFAULT NULL,
  `accountid` varchar(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=31 DEFAULT CHARSET=latin1;

3、创建数据库连接工具类

新建com.hsiehchou.common.netb.db包
创建DBCommon类

DBCommon.java

package com.hsiehchou.common.netb.db;

import com.hsiehchou.common.config.ConfigUtil;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.sql.*;
import java.util.Properties;

public class DBCommon {

    private static Logger LOG = LoggerFactory.getLogger(DBCommon.class);
    private static String MYSQL_PATH = "common/mysql.properties";
    private static Properties properties = ConfigUtil.getInstance().getProperties(MYSQL_PATH);

    private static Connection conn ;
    private DBCommon(){}

    public static void main(String[] args) {
        System.out.println(properties);
        Connection xz_bigdata = DBCommon.getConn("test");
        System.out.println(xz_bigdata);
    }

    //TODO  配置文件
    private static final String JDBC_DRIVER = "com.mysql.jdbc.Driver";
    private static final String USER_NAME = properties.getProperty("user");
    private static final String PASSWORD = properties.getProperty("password");
    private static final String IP = properties.getProperty("db_ip");
    private static final String PORT = properties.getProperty("db_port");
    private static final String DB_CONFIG = "?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull&autoReconnect=true&failOverReadOnly=false";

    static {
        try {
            Class.forName(JDBC_DRIVER);
        } catch (ClassNotFoundException e) {
            LOG.error(null, e);
        }
    }

    /**
     * 获取数据库连接
     * @param dbName
     * @return
     */
    public static Connection getConn(String dbName) {
        Connection conn = null;
        String  connstring = "jdbc:mysql://"+IP+":"+PORT+"/"+dbName+DB_CONFIG;
        try {
            conn = DriverManager.getConnection(connstring, USER_NAME, PASSWORD);
        } catch (SQLException e) {
            e.printStackTrace();
            LOG.error(null, e);
        }
        return conn;
    }

    /**
     * @param url eg:"jdbc:oracle:thin:@172.16.1.111:1521:d406"
     * @param driver eg:"oracle.jdbc.driver.OracleDriver"
     * @param user eg:"ucase"
     * @param password eg:"ucase123"
     * @return
     * @throws ClassNotFoundException
     * @throws SQLException
     */
    public static Connection getConn(String url, String driver, String user,
                                     String password) throws ClassNotFoundException, SQLException{
        Class.forName(driver);
        conn = DriverManager.getConnection(url, user, password);
        return  conn;
    }

    public static void close(Connection conn){
        try {
            if( conn != null ){
                conn.close();
            }
        } catch (SQLException e) {
            LOG.error(null,e);
        }
    }

    public static void close(Statement statement){
        try {
            if( statement != null ){
                statement.close();
            }
        } catch (SQLException e) {
            LOG.error(null,e);
        }
    }

    public static void close(Connection conn,PreparedStatement statement){
        try {
            if( conn != null ){
                conn.close();
            }
            if( statement != null ){
                statement.close();
            }
        } catch (SQLException e) {
            LOG.error(null,e);
        }
    }

    public static void close(Connection conn,Statement statement,ResultSet resultSet) throws SQLException{

        if( resultSet != null ){
            resultSet.close();
        }
        if( statement != null ){
            statement.close();
        }
        if( conn != null ){
            conn.close();
        }
    }
}

引入maven依赖

<dependency>
    <groupId>commons-dbutils</groupId>
    <artifactId>commons-dbutils</artifactId>
    <version>${commons-dbutils.version}</version>
</dependency>

4、创建实体类和dao

新建com.hsiehchou.spark.warn.domain包
新建 XZ_RuleDomain,WarningMessage

XZ_RuleDomain.java

package com.hsiehchou.spark.warn.domain;

import java.sql.Date;

public class XZ_RuleDomain {

    private int id;
    private String warn_fieldname;   //预警字段
    private String warn_fieldvalue; //预警内容
    private String publisher;       //发布者
    private String send_type;       //消息接收方式
    private String send_mobile;     //接收手机号
    private String send_mail;       //接收邮箱
    private String send_dingding;   //接收钉钉
    private Date create_time;       //创建时间


    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public String getWarn_fieldname() {
        return warn_fieldname;
    }

    public void setWarn_fieldname(String warn_fieldname) {
        this.warn_fieldname = warn_fieldname;
    }

    public String getWarn_fieldvalue() {
        return warn_fieldvalue;
    }

    public void setWarn_fieldvalue(String warn_fieldvalue) {
        this.warn_fieldvalue = warn_fieldvalue;
    }

    public String getPublisher() {
        return publisher;
    }

    public void setPublisher(String publisher) {
        this.publisher = publisher;
    }

    public String getSend_type() {
        return send_type;
    }

    public void setSend_type(String send_type) {
        this.send_type = send_type;
    }

    public String getSend_mobile() {
        return send_mobile;
    }

    public void setSend_mobile(String send_mobile) {
        this.send_mobile = send_mobile;
    }

    public String getSend_mail() {
        return send_mail;
    }

    public void setSend_mail(String send_mail) {
        this.send_mail = send_mail;
    }

    public String getSend_dingding() {
        return send_dingding;
    }

    public void setSend_dingding(String send_dingding) {
        this.send_dingding = send_dingding;
    }

    public Date getCreate_time() {
        return create_time;
    }

    public void setCreate_time(Date create_time) {
        this.create_time = create_time;
    }
}

WarningMessage.java

package com.hsiehchou.spark.warn.domain;

import java.sql.Date;

public class WarningMessage {
    private String id;            //主键id
    private String alarmRuleid;   //规则id
    private String alarmType;     //告警类型
    private String sendType;      //发送方式
    private String sendMobile;    //发送至手机
    private String sendEmail;     //发送至邮箱
    private String sendStatus;    //发送状态
    private String senfInfo;      //发送内容
    private Date hitTime;         //命中时间
    private Date checkinTime;     //入库时间
    private String isRead;        //是否已读
    private String readAccounts;  //已读用户
    private String alarmaccounts;
    private String accountid;

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getAlarmRuleid() {
        return alarmRuleid;
    }

    public void setAlarmRuleid(String alarmRuleid) {
        this.alarmRuleid = alarmRuleid;
    }

    public String getAlarmType() {
        return alarmType;
    }

    public void setAlarmType(String alarmType) {
        this.alarmType = alarmType;
    }

    public String getSendType() {
        return sendType;
    }

    public void setSendType(String sendType) {
        this.sendType = sendType;
    }

    public String getSendMobile() {
        return sendMobile;
    }

    public void setSendMobile(String sendMobile) {
        this.sendMobile = sendMobile;
    }

    public String getSendEmail() {
        return sendEmail;
    }

    public void setSendEmail(String sendEmail) {
        this.sendEmail = sendEmail;
    }

    public String getSendStatus() {
        return sendStatus;
    }

    public void setSendStatus(String sendStatus) {
        this.sendStatus = sendStatus;
    }

    public String getSenfInfo() {
        return senfInfo;
    }

    public void setSenfInfo(String senfInfo) {
        this.senfInfo = senfInfo;
    }

    public Date getHitTime() {
        return hitTime;
    }

    public void setHitTime(Date hitTime) {
        this.hitTime = hitTime;
    }

    public Date getCheckinTime() {
        return checkinTime;
    }

    public void setCheckinTime(Date checkinTime) {
        this.checkinTime = checkinTime;
    }

    public String getIsRead() {
        return isRead;
    }

    public void setIsRead(String isRead) {
        this.isRead = isRead;
    }

    public String getReadAccounts() {
        return readAccounts;
    }

    public void setReadAccounts(String readAccounts) {
        this.readAccounts = readAccounts;
    }

    public String getAlarmaccounts() {
        return alarmaccounts;
    }

    public void setAlarmaccounts(String alarmaccounts) {
        this.alarmaccounts = alarmaccounts;
    }

    public String getAccountid() {
        return accountid;
    }

    public void setAccountid(String accountid) {
        this.accountid = accountid;
    }

    @Override
    public String toString() {
        return "WarningMessage{" +
                "id='" + id + '\'' +
                ", alarmRuleid='" + alarmRuleid + '\'' +
                ", alarmType='" + alarmType + '\'' +
                ", sendType='" + sendType + '\'' +
                ", sendMobile='" + sendMobile + '\'' +
                ", sendEmail='" + sendEmail + '\'' +
                ", sendStatus='" + sendStatus + '\'' +
                ", senfInfo='" + senfInfo + '\'' +
                ", hitTime=" + hitTime +
                ", checkinTime=" + checkinTime +
                ", isRead='" + isRead + '\'' +
                ", readAccounts='" + readAccounts + '\'' +
                ", alarmaccounts='" + alarmaccounts + '\'' +
                ", accountid='" + accountid + '\'' +
                '}';
    }
}

新建com.hsiehchou.spark.warn.dao包
新建 XZ_RuleDao,WarningMessageDao

XZ_RuleDao.java

package com.hsiehchou.spark.warn.dao;

import com.hsiehchou.common.netb.db.DBCommon;
import com.hsiehchou.spark.warn.domain.XZ_RuleDomain;
import org.apache.commons.dbutils.QueryRunner;
import org.apache.commons.dbutils.handlers.BeanListHandler;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.sql.Connection;
import java.sql.SQLException;
import java.util.List;

public class XZ_RuleDao {

    private static final Logger LOG = LoggerFactory.getLogger(XZ_RuleDao.class);

    /**
     *  获取所有的规则
     * @return
     */
    public static List<XZ_RuleDomain> getRuleList(){
        List<XZ_RuleDomain> listRules = null;

        //获取连接
        Connection conn = DBCommon.getConn("test");

        //执行器
        QueryRunner query = new QueryRunner();
        String sql = "select * from xz_rule";
        try {
            listRules = query.query(conn,sql,new BeanListHandler<>(XZ_RuleDomain.class));
        } catch (SQLException e) {
            LOG.error(null,e);
        }finally {
            DBCommon.close(conn);
        }
        return listRules;
    }

    public static void main(String[] args) {
        List<XZ_RuleDomain> ruleList = XZ_RuleDao.getRuleList();
        System.out.println(ruleList.size());
        ruleList.forEach(x->{
            System.out.println(x);
        });
    }
}

WarningMessageDao.java

package com.hsiehchou.spark.warn.dao;

import com.hsiehchou.common.netb.db.DBCommon;
import com.hsiehchou.spark.warn.domain.WarningMessage;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.sql.*;

public class WarningMessageDao {

    private static final Logger LOG = LoggerFactory.getLogger(WarningMessageDao.class);

    /**
     * 写入消息到mysql
     * @param warningMessage
     * @return
     */
    public static Integer insertWarningMessageReturnId(WarningMessage warningMessage) {
        Connection conn= DBCommon.getConn("test");
        String sql="insert into warn_message(alarmruleid,sendtype,senfinfo,hittime,sendmobile,alarmtype) " +
                "values(?,?,?,?,?,?)";

        PreparedStatement stmt=null;
        ResultSet resultSet=null;
        int id=-1;
        try{
            stmt = conn.prepareStatement(sql);
            stmt.setString(1,warningMessage.getAlarmRuleid());
            stmt.setInt(2,Integer.valueOf(warningMessage.getSendType()));
            stmt.setString(3,warningMessage.getSenfInfo());
            stmt.setTimestamp(4,new Timestamp(System.currentTimeMillis()));
            stmt.setString(5,warningMessage.getSendMobile());
            stmt.setInt(6,Integer.valueOf(warningMessage.getAlarmType()));
            stmt.executeUpdate();
        }catch(Exception e) {
            LOG.error(null,e);
        }finally {
            try {
                DBCommon.close(conn,stmt,resultSet);
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
        return id;
    }
}

5、告警工具类

新建com.hsiehchou.spark.warn.service包
新建 BlackRuleWarning,WarningMessageSendUtil

BlackRuleWarning.java

package com.hsiehchou.spark.warn.service;

import com.hsiehchou.spark.warn.dao.WarningMessageDao;
import com.hsiehchou.spark.warn.domain.WarningMessage;
import org.apache.commons.lang3.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import redis.clients.jedis.Jedis;

import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

public class BlackRuleWarning {
    private static final Logger LOG = LoggerFactory.getLogger(BlackRuleWarning.class);
    //可以通过数据库,配置文件加载

    //为了遍历所有预警字段
    private static List<String> listWarnFields = new ArrayList<>();

    static {
        listWarnFields.add("phone");
        listWarnFields.add("mac");
    }

    /**
     * 预警流程处理
     * @param map
     * @param jedis15
     */
    public static void blackWarning(Map<String, Object> map, Jedis jedis15) {

        listWarnFields.forEach(warnField -> {
            if (map.containsKey(warnField) && StringUtils.isNotBlank(map.get(warnField).toString())) {

                //获取预警字段核预警值  相当于手机号
                String warnFieldValue = map.get(warnField).toString();

                //去redis中进行比对
                //数据中  通过   "字段" + "字段值" 去拼接key
                //            phone       :    186XXXXXX
                String key = warnField + ":" + warnFieldValue;

                //redis中的key是   phone:18609765435
                System.out.println("拼接数据流中的key=======" + key);
                if (jedis15.exists(key)) {
                    //对比命中之后 就可以发送消息提醒
                    System.out.println("命中REDIS中的" + key + "===========开始预警");
                    beginWarning(jedis15, key);
                } else {
                    //直接过
                    System.out.println("未命中" + key + "===========不进行预警");
                }
            }
        });
    }

    /**
     * 规则已经命中,开始预警
     * @param jedis15
     * @param key
     */
    private static void beginWarning( Jedis jedis15, String key) {

        System.out.println("============MESSAGE -1- =========");
        //封装告警  信息及告警消息
        WarningMessage warningMessage = getWarningMessage(jedis15, key);


        System.out.println("============MESSAGE -4- =========");
        if (warningMessage != null) {
            //将预警信息写入预警信息表
            WarningMessageDao.insertWarningMessageReturnId(warningMessage);
            //String accountid = warningMessage.getAccountid();
            //String readAccounts = warningMessage.getAlarmaccounts();
            // WarnService.insertRead_status(messageId, accountid);
            if (warningMessage.getSendType().equals("2")) {
                //手机短信告警 默认告警方式
                WarningMessageSendUtil.messageWarn(warningMessage);
            }
        }
    }

    /**
     * 封装告警信息及告警消息
     * @param jedis15
     * @param key
     * @return
     */
    private static WarningMessage getWarningMessage(Jedis jedis15, String key) {
        System.out.println("============MESSAGE -2- =========");
        //封装消息
        String[] split = key.split(":");
        if (split.length == 2) {
            WarningMessage warningMessage = new WarningMessage();
            String time = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").toString();
            String clew_type = split[0];//告警字段
            String rulecontent = split[1];//告警字段值

            //从redis中获取消息信息进行封装
            Map<String, String> valueMap = jedis15.hgetAll(key);

            //规则ID (是哪条规则命中的)
            warningMessage.setAlarmRuleid(valueMap.get("id"));

            //预警方式
            warningMessage.setSendType(valueMap.get("send_type"));//告警方式,0:界面 1:邮件 2:短信 3:邮件+短信

            //预警信息接收手机号
            warningMessage.setSendMobile(valueMap.get("send_mobile"));

            //arningMessage.setSendEmail(valueMap.get("sendemail"));
            /*arningMessage.setAlarmaccounts(valueMap.get("alarmaccounts"));*/
            //规则发布人
            warningMessage.setAccountid(valueMap.get("publisher"));
            warningMessage.setAlarmType("2");
            StringBuffer warn_content = new StringBuffer();

            //预警内容 信息   时间  地点  人物
            //预警字段来进行设置  phone
            //我们有手机号


            //数据关联
            // 手机  MAC  身份证, 车牌  人脸。。URL 姓名
            // 全部设在推送消息里面
            warn_content.append("【网络告警】:手机号为:" + "[" + rulecontent + "]在时间" + time + "出现在" + ">附近,设备号"
            );
            String content = warn_content.toString();
            warningMessage.setSenfInfo(content);
            System.out.println("============MESSAGE -3- =========");
            return warningMessage;
        } else {
            return null;
        }
    }
}

WarningMessageSendUtil.java

package com.hsiehchou.spark.warn.service;

import com.hsiehchou.common.regex.Validation;
import com.hsiehchou.spark.warn.domain.WarningMessage;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class WarningMessageSendUtil {
    private static final Logger LOG = LoggerFactory.getLogger(WarningMessageSendUtil.class);

    public static void messageWarn(WarningMessage warningMessage) {

        String[] mobiles = warningMessage.getSendMobile().split(",");

        for(String phone:mobiles){
            if(Validation.isMobile(phone)){
                System.out.println("开始向手机号为" + phone + "发送告警消息====" + warningMessage);
                StringBuffer sb= new StringBuffer();
                String content=warningMessage.getSenfInfo().toString();
                //TODO  调用短信接口发送消息
                //TODO  怎么通过短信发送  这个是需要公司开通接口
                //TODO  DINGDING
                // 专门的接口
             /*   sb.append(ClusterProperties.https_url + "username=" + ClusterProperties.https_username +
                        "&password=" + ClusterProperties.https_password + "&mobile=" + phone +
                        "&apikey=" + ClusterProperties.https_apikey+
                        "&content=" + URLEncoder.encode(content));*/
               // sendMessage(sb.toString());
            }
        }
    }
}

6、创建redis子项目

操作redis 使用

新建xz_bigdata_redis子模块

pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata2</artifactId>
        <groupId>com.hsiehchou</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_redis</artifactId>

    <name>xz_bigdata_redis</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <jedis.version>2.7.0</jedis.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_resources</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>redis.clients</groupId>
            <artifactId>jedis</artifactId>
            <version>${jedis.version}</version>
        </dependency>
    </dependencies>
</project>

新建com.hsiehchou.redis.client包
创建redis连接类—JedisSingle

JedisSingle.java

package com.hsiehchou.redis.client;

import com.hsiehchou.common.config.ConfigUtil;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import redis.clients.jedis.Jedis;
import redis.clients.jedis.exceptions.JedisConnectionException;

import java.net.SocketTimeoutException;
import java.util.Map;
import java.util.Properties;

public class JedisSingle {

    private static final Logger LOG = LoggerFactory.getLogger(JedisSingle.class);
    private static Properties redisConf;

    /**
     * 读取redis配置文件
     * redis.hostname = 192.168.247.103
     * redis.port  = 6379
     */
    static {
        redisConf = ConfigUtil.getInstance().getProperties("redis/redis.properties");
        System.out.println(redisConf);
    }

    public static Jedis getJedis(int db){
        Jedis jedis = JedisSingle.getJedis();
        if(jedis!=null){
            jedis.select(db);
        }
        return jedis;
    }

    public static void main(String[] args) {
        Jedis jedis = JedisSingle.getJedis(15);
        Map<String, String> Map = jedis.hgetAll("phone:18609765435");
        System.out.println(Map.toString());
    }

    public static Jedis getJedis(){
        int timeoutCount = 0;
        while (true) {// 如果是网络超时则多试几次
            try
            {
                 Jedis jedis = new Jedis(redisConf.get("redis.hostname").toString(),
                         Integer.valueOf(redisConf.get("redis.port").toString()));
                return jedis;
            } catch (Exception e)
            {
                if (e instanceof JedisConnectionException || e instanceof SocketTimeoutException)
                {
                    timeoutCount++;
                    LOG.warn("获取jedis连接超时次数:" +timeoutCount);
                    if (timeoutCount > 4)
                    {
                        LOG.error("获取jedis连接超时次数a:" +timeoutCount);
                        LOG.error(null,e);
                        break;
                    }
                }else
                {
                    LOG.error("getJedis error", e);
                    break;
                }
            }
        }
        return null;
    }

    public static void close(Jedis jedis){
        if(jedis!=null){
            jedis.close();
        }
    }
}

7、创建定时任务,将规则同步到redis

新建 com.hsiehchou.spark.warn.timer 包
新建 SyncRule2Redis,WarnHelper

SyncRule2Redis.java

package com.hsiehchou.spark.warn.timer;

import java.util.TimerTask;

public class SyncRule2Redis extends TimerTask {
    @Override
    public void run() {
        //这里定义同步方法
        //就是读取mysql的数据 然后写入到redis中
        System.out.println("========开始同步MYSQL规则到redis=======");
        WarnHelper.syncRuleFromMysql2Redis();
        System.out.println("============开始同步规则成功===========");
    }
}

WarnHelper.java

package com.hsiehchou.spark.warn.timer;

import com.hsiehchou.redis.client.JedisSingle;
import com.hsiehchou.spark.warn.dao.XZ_RuleDao;
import com.hsiehchou.spark.warn.domain.XZ_RuleDomain;
import org.apache.commons.lang3.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import redis.clients.jedis.Jedis;

import java.util.List;

public class WarnHelper {

    private static final Logger LOG = LoggerFactory.getLogger(WarnHelper.class);

    /**
     * 同步mysql规则数据到redis
     */
    public static void syncRuleFromMysql2Redis(){
        //获取所有的规则
        List<XZ_RuleDomain> ruleList = XZ_RuleDao.getRuleList();
        Jedis jedis = null;
        try {
            //获取redis 客户端
            jedis = JedisSingle.getJedis(15);
            for (int i = 0; i < ruleList.size(); i++) {
                XZ_RuleDomain rule = ruleList.get(i);
                String id = rule.getId()+"";
                String publisher = rule.getPublisher();
                String warn_fieldname = rule.getWarn_fieldname();
                String warn_fieldvalue = rule.getWarn_fieldvalue();
                String send_mobile = rule.getSend_mobile();
                String send_type = rule.getSend_type();

                //拼接redis key值
                String redisKey = warn_fieldname +":" + warn_fieldvalue;

                //通过redis hash结构   hashMap
                jedis.hset(redisKey,"id",StringUtils.isNoneBlank(id) ? id : "");
                jedis.hset(redisKey,"publisher",StringUtils.isNoneBlank(publisher) ? publisher : "");
                jedis.hset(redisKey,"warn_fieldname",StringUtils.isNoneBlank(warn_fieldname) ? warn_fieldname : "");
                jedis.hset(redisKey,"warn_fieldvalue",StringUtils.isNoneBlank(warn_fieldvalue) ? warn_fieldvalue : "");
                jedis.hset(redisKey,"send_mobile",StringUtils.isNoneBlank(send_mobile) ? send_mobile : "");
                jedis.hset(redisKey,"send_type",StringUtils.isNoneBlank(send_type) ? send_type : "");
            }
        } catch (Exception e) {
           LOG.error("同步规则到es失败",e);
        } finally {
            JedisSingle.close(jedis);
        }
    }

    public static void main(String[] args)
    {
        WarnHelper.syncRuleFromMysql2Redis();
    }
}

8、创建streaming流任务

scala/com/hsiehchou/spark/streaming/kafka/warn
WarningStreamingTask.scala

package com.hsiehchou.spark.streaming.kafka.warn

import java.util.Timer

import com.hsiehchou.redis.client.JedisSingle
import com.hsiehchou.spark.common.SparkContextFactory
import com.hsiehchou.spark.streaming.kafka.Spark_Kafka_ConfigUtil
import com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming.kafkaConfig
import com.hsiehchou.spark.warn.service.BlackRuleWarning
import com.hsiehchou.spark.warn.timer.SyncRule2Redis
import org.apache.spark.Logging
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.kafka.KafkaManager
import redis.clients.jedis.Jedis

object WarningStreamingTask extends Serializable with Logging{

  def main(args: Array[String]): Unit = {

     //定义一个定时器去定时同步 MYSQL到REDIS
     val timer : Timer = new Timer

    //SyncRule2Redis 任务类
    //0 第一次开始执行
    //1*60*1000  隔多少时间执行一次
    timer.schedule(new SyncRule2Redis,0,1*60*1000)

     //从kafka中获取数据流
     //val topics = "chl_test7".split(",")
     //kafka topic
     val topics = "chl_test7".split(",")

     //val ssc = SparkContextFactory.newSparkLocalStreamingContext("WarningStreamingTask1", java.lang.Long.valueOf(10),1)
     val ssc:StreamingContext = SparkContextFactory.newSparkStreamingContext("Kafka2esStreaming", java.lang.Long.valueOf(10))

    //构建kafkaManager
    val kafkaManager = new KafkaManager(
      Spark_Kafka_ConfigUtil.getKafkaParam(kafkaConfig.getProperty("metadata.broker.list"), "WarningStreamingTask111")
    )
    //使用kafkaManager创建DStreaming流
    val kafkaDS = kafkaManager.createJsonToJMapStringDirectStreamWithOffset(ssc, topics.toSet)
      //添加一个日期分组字段
      //如果数据其他的转换,可以先在这里进行统一转换
       .persist(StorageLevel.MEMORY_AND_DISK)


    kafkaDS.foreachRDD(rdd=>{

      //流量预警
      //if(!rdd.isEmpty()){
/*      val count_flow = rdd.map(x=>{
          val flow = java.lang.Long.valueOf(x.get("collect_time"))
          flow
        }).reduce(_+_)
      if(count_flow > 1719179595L){
        println("流量预警: 阈值[1719179595L] 实际值:"+ count_flow)
      }*/
      //}

      //客户端连接之类的 最好不要放在RDD外面,因为在处理partion时,数据需要分发到各个节点上去
      //数据分发必须需要序列化才可以,如果不能序列化,分发会报错
      //如果这个数据 包括他里面的内容 都可以序列化,那么可以直接放在RDD外面
      var jedis:Jedis = null
      try {
        //jedis = JedisSingle.getJedis(15)
        rdd.foreachPartition(partion => {
          jedis = JedisSingle.getJedis(15)
          while (partion.hasNext) {
            val map = partion.next()
            val table = map.get("table")
            val mapObject = map.asInstanceOf[java.util.Map[String,Object]]
            println(table)
            //开始比对
            BlackRuleWarning.blackWarning(mapObject,jedis)
          }
        })
      } catch {
        case e => e.printStackTrace()
      } finally {
        JedisSingle.close(jedis)
      }


 /*       rdd.foreachPartition(partion => {
          var jedis: Jedis = null
          try {
            jedis = JedisSingle.getJedis(15)
            while (partion.hasNext) {
              val map = partion.next()
              val mapObject = map.asInstanceOf[java.util.Map[String, Object]]
              //开始比对
              BlackRuleWarning.blackWarning(mapObject, jedis)
            }
          } catch {
            case e => logError(null,e)
          }finally {
            JedisSingle.close(jedis)
          }
        })*/

    })

    ssc.start()
    ssc.awaitTermination()
  }
}

9、执行

spark-submit --master local[1] --num-executors 1 --driver-memory 300m --executor-memory 500m --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ‘ ‘ ‘,’) --class com.hsiehchou.spark.streaming.kafka.warn.WarningStreamingTask /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar

10、截图

redis安装成功

预警

RedisManager

mysql-xz_rule

发送预警

11、redis安装

解压:tar -zxvf redis-3.0.5.tar.gz
cd redis-3.0.5/
make
make PREFIX=/opt/software/redis install

redis-benchmark : Redis提供的压力测试工具。模拟产生客户端的压力
redis-check-aof : 检查aof日志文件
redis-check-dump : 检查rdb文件
redis-cli : Redis客户端脚本
redis-sentinel : 哨兵
redis-server : Redis服务器脚本

核心配置文件:redis.conf
[root@hsiehchou202 redis-3.0.5]# cp redis.conf /opt/software/redis
[root@hsiehchou202 redis]# mkdir conf
[root@hsiehchou202 redis]# mv redis.conf conf/
[root@hsiehchou202 conf]# vi redis.conf

42行 daemonize yes //后台方式运行
50行 port 6379

启动redis ./bin/redis-server conf/redis.conf

检测是否启动好
[root@hsiehchou202 redis]# bin/redis-server conf/redis.conf

十、Spark—kafka2hive

1、CDH启用Hive on spark

设置 hive on spark 参数
原来的HIVE执行引擎使用的hadoop的mapreduce,Hive on Spark 就是讲执行引擎换为spark 引擎

2、hive配置文件

scala/com/hsiehchou/spark/streaming/kafka/kafka2hdfs/

HiveConfig.scala

package com.hsiehchou.spark.streaming.kafka.kafka2hdfs

import java.util

import org.apache.commons.configuration.{CompositeConfiguration, ConfigurationException, PropertiesConfiguration}
import org.apache.spark.Logging
import org.apache.spark.sql.types.{StringType, StructField, StructType}

import scala.collection.mutable.ArrayBuffer
import scala.collection.JavaConversions._

object HiveConfig extends Serializable with Logging {

  //HIVE 文件根目录
  var hive_root_path = "/apps/hive/warehouse/external/"
  var hiveFieldPath = "es/mapping/fieldmapping.properties"

  var config: CompositeConfiguration = null

  //所有的表
  var tables: util.List[_] = null

  //表对应所有的字段映射,可以通过table名获取 这个table的所有字段
  var tableFieldsMap: util.Map[String, util.HashMap[String, String]] = null

  //StructType
  var mapSchema: util.Map[String, StructType] = null

  //建表语句
  var hiveTableSQL: util.Map[String, String] = null

  /**
    * 主要就是创建mapSchema  和  hiveTableSQL
    */
  initParams()

  def main(args: Array[String]): Unit = {
  }

  /**
    * 初始化HIVE参数
    */
  def initParams(): Unit = {
    //加载es/mapping/fieldmapping.properties 配置文件
    config = HiveConfig.readCompositeConfiguration(hiveFieldPath)
    println("==========================config====================================")

    config.getKeys.foreach(key => {
      println(key + ":" + config.getProperty(key.toString))
    })
    println("==========================tables====================================")
    //wechat,mail,qq
    tables = config.getList("tables")

    tables.foreach(table => {
      println(table)
    })

    var tables1 = config.getProperty("tables")

    println("======================tableFieldsMap================================")
    //(qq,{qq.imsi=string, qq.id=string, qq.send_message=string, qq.filename=string})
    tableFieldsMap = HiveConfig.getKeysByType()
    tableFieldsMap.foreach(x => {
      println(x)
    })
    println("=========================mapSchema===================================")
    mapSchema = HiveConfig.createSchema()
    mapSchema.foreach(x => {
//      val structType = x._2
//      println("-----------")
//      println(structType)
//
//
//      val names = structType.fieldNames
//      names.foreach(field => {
//        println(field)
//      })
      println(x)
    })
    println("=========================hiveTableSQL===================================")
    hiveTableSQL = HiveConfig.getHiveTables()
    hiveTableSQL.foreach(x => {
      println(x)
    })
  }

  /**
    * 读取hive 字段配置文件
    * @param path
    * @return
    */
  def readCompositeConfiguration(path: String): CompositeConfiguration = {
    logInfo("加载配置文件 " + path)
    //多配置工具
    val compositeConfiguration = new CompositeConfiguration
    try {
      val configuration = new PropertiesConfiguration(path)
      compositeConfiguration.addConfiguration(configuration)
    } catch {
      case e: ConfigurationException => {
        logError("加载配置文件 " + path + "失败", e)
      }
    }
    logInfo("加载配置文件" + path + "成功。 ")
    compositeConfiguration
  }

  /**
    * 获取table-字段 对应关系
    * 使用 util.Map[String,util.HashMap[String, String结构保存
    * @return
    */
  def getKeysByType(): util.Map[String, util.HashMap[String, String]] = {

    val map = new util.HashMap[String, util.HashMap[String, String]]()
    println("__________________tables_____________________"+tables)
    //wechat, mail, qq
    val iteratorTable = tables.iterator()

    //对每个表进行遍历
    while (iteratorTable.hasNext) {

      //使用一个MAP保存一种对应关系
      val fieldMap = new util.HashMap[String, String]()

      //获取一个表
      val table: String = iteratorTable.next().toString
      //获取这个表的所有字段
      val fields = config.getKeys(table)
      //获取通用字段  这里暂时没有
      val commonKeys: util.Iterator[String] = config.getKeys("common").asInstanceOf[util.Iterator[String]]

      //将通用字段放到map结构中去
      while (commonKeys.hasNext) {
        val key = commonKeys.next()
        fieldMap.put(key.replace("common", table), config.getString(key))
      }

      //将每种表的私有字段放到map中去
      while (fields.hasNext) {
        val field = fields.next().toString
        fieldMap.put(field, config.getString(field))
        println("__________________field_____________________"+"\n"+field)
      }
      map.put(table, fieldMap)
    }
    map
  }

  /**
    * 构建建表语句
    * 例如CREATE external TABLE IF NOT EXISTS qq (imei string,imsi string,longitude string,latitude string,phone_mac string,device_mac string,device_number string,collect_time string,username string,phone string,object_username string,send_message string,accept_message string,message_time string,id string,table string,filename string,absolute_filename string)
    * @return
    */
  def getHiveTables(): util.Map[String, String] = {

    val hiveTableSqlMap: util.Map[String, String] = new util.HashMap[String, String]()

    //获取没中数据的建表语句
    tables.foreach(table => {

      var sql: String = s"CREATE external TABLE IF NOT EXISTS ${table} ("

      val tableFields = config.getKeys(table.toString)
      tableFields.foreach(tableField => {
        //qq.imsi=string, qq.id=string, qq.send_message=string
        val fieldType = config.getProperty(tableField.toString)
        val field = tableField.toString.split("\\.")(1)
        sql = sql + field
        fieldType match {
          //就是将配置中的类型映射为HIVE 建表语句中的类型
          case "string" => sql = sql + " string,"
          case "long" => sql = sql + " string,"
          case "double" => sql = sql + " string,"
          case _ => println("Nothing Matched!!" + fieldType)
        }
      })
      sql = sql.substring(0, sql.length - 1)
      //sql = sql + s")STORED AS PARQUET location '${hive_root_path}${table}'"
      sql = sql + s") partitioned by(year string,month string,day string) STORED AS PARQUET " + s"location '${hive_root_path}${table}'"
      hiveTableSqlMap.put(table.toString, sql)
    })
    hiveTableSqlMap
  }

  /**
    * 使用tableFieldsMap
    * 对每种类型数据创建对应的Schema
    * @return
    */
  def createSchema(): util.Map[String, StructType] = {
    // schema  表结构
    /*   CREATE TABLE `warn_message` (
         //arrayStructType
         `id` int(11) NOT NULL AUTO_INCREMENT,
         `alarmRuleid` varchar(255) DEFAULT NULL,
         `alarmType` varchar(255) DEFAULT NULL,
         `sendType` varchar(255) DEFAULT NULL,
         `sendMobile` varchar(255) DEFAULT NULL,
         `sendEmail` varchar(255) DEFAULT NULL,
         `sendStatus` varchar(255) DEFAULT NULL,
         `senfInfo` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
         `hitTime` datetime DEFAULT NULL,
         `checkinTime` datetime DEFAULT NULL,
         `isRead` varchar(255) DEFAULT NULL,
         `readAccounts` varchar(255) DEFAULT NULL,
         `alarmaccounts` varchar(255) DEFAULT NULL,
         `accountid` varchar(11) DEFAULT NULL,
         PRIMARY KEY (`id`)
       ) ENGINE=MyISAM AUTO_INCREMENT=528 DEFAULT CHARSET=latin1;*/

    val mapStructType: util.Map[String, StructType] = new util.HashMap[String, StructType]()

    for (table <- tables) {
      //通过tableFieldsMap 拿到这个表的所有字段
      val tableFields = tableFieldsMap.get(table)
      //对这个字段进行遍历
      val keyIterator = tableFields.keySet().iterator()
      //创建ArrayBuffer
      var arrayStructType = ArrayBuffer[StructField]()
      while (keyIterator.hasNext) {
        val key = keyIterator.next()
        val value = tableFields.get(key)

        //将key拆分 获取 "."后面的部分作为数据字段
        val field = key.split("\\.")(1)
        value match {
          /* case "string" => arrayStructType += StructField(field, StringType, true)
           case "long"   => arrayStructType += StructField(field, LongType, true)
           case "double"   => arrayStructType += StructField(field, DoubleType, true)*/
          case "string" => arrayStructType += StructField(field, StringType, true)
          case "long" => arrayStructType += StructField(field, StringType, true)
          case "double" => arrayStructType += StructField(field, StringType, true)
          case _ => println("Nothing Matched!!" + value)
        }
      }
      val schema = StructType(arrayStructType)
      mapStructType.put(table.toString, schema)
    }
    mapStructType
  }
}

3、kafka写hdfs和创建hive表

Kafka2HiveTest.scala

package com.hsiehchou.spark.streaming.kafka.kafka2hdfs

import java.util

import com.hsiehchou.hdfs.HdfsAdmin
import com.hsiehchou.hive.HiveConf
import com.hsiehchou.spark.common.{SparkContextFactory}
import com.hsiehchou.spark.streaming.kafka.Spark_Kafka_ConfigUtil
import com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming.kafkaConfig
import org.apache.hadoop.fs.Path
import org.apache.spark.{Logging}
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.{DataFrame, Row, SaveMode}
import org.apache.spark.sql.types.StructType
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.kafka.KafkaManager

import scala.collection.JavaConversions._

object Kafka2HiveTest extends Serializable with Logging{

  val topics = "chl_test7".split(",")

  //获取所有数据类型
  //获取所有数据的Schema
  def main(args: Array[String]): Unit = {
    //val ssc = SparkContextFactory.newSparkLocalStreamingContext("XZ_kafka2es", java.lang.Long.valueOf(10),1)

    val ssc = SparkContextFactory.newSparkStreamingContext("Kafka2HiveTest", java.lang.Long.valueOf(10))

    //1.创建HIVE表  hiveSQL已經創建好了
    val sc = ssc.sparkContext
    val hiveContext: HiveContext = HiveConf.getHiveContext(sc)
    hiveContext.setConf("spark.sql.parquet.mergeSchema", "true")
    createHiveTable(hiveContext)

    //kafka拿到流数据
    val kafkaDS = new KafkaManager(Spark_Kafka_ConfigUtil
                                    .getKafkaParam(kafkaConfig.getProperty("metadata.broker.list"),
                                      "Kafka2HiveTest"))
                                    .createJsonToJMapStringDirectStreamWithOffset(ssc, topics.toSet)
                                    .persist(StorageLevel.MEMORY_AND_DISK)

    HiveConfig.tables.foreach(table=>{
      //过滤出单一数据类型(获取和table相同类型的所有数据)
       val tableDS = kafkaDS.filter(x => {table.equals(x.get("table"))})

      //获取数据类型的schema 表结构
      val schema = HiveConfig.mapSchema.get(table)

      //获取这个表的所有字段
      val schemaFields: Array[String] = schema.fieldNames
      tableDS.foreachRDD(rdd=>{

        //TODO 数据写入HDFS
        /* val sc = rdd.sparkContext
        val hiveContext = HiveConf.getHiveContext(sc)
        hiveContext.sql(s"USE DEFAULT")*/

        //将RDD转为DF   原因:要加字段描述,写比较方便
        val tableDF = rdd2DF(rdd,schemaFields,hiveContext,schema)

        //多种数据一起处理
        val path_all = s"hdfs://hadoop1:8020${HiveConfig.hive_root_path}${table}"
        val exists = HdfsAdmin.get().getFs.exists(new Path(path_all))

        //2.写到HDFS   不管存不存在我们都要把数据写入进去 通过追加的方式
        //每10秒写一次,写一次会生成一个文件
        tableDF.write.mode(SaveMode.Append).parquet(path_all)

        //3.加载数据到HIVE
        if (!exists) {
          //如果不存在 进行首次加载
          System.out.println("===================开始加载数据到分区=============")
          hiveContext.sql(s"ALTER TABLE ${table} LOCATION '${path_all}'")
        }
      })
    })
    ssc.start()
    ssc.awaitTermination()
  }

  /**
    * 创建HIVE表
    * @param hiveContext
    */
  def createHiveTable(hiveContext: HiveContext): Unit ={
    val keys = HiveConfig.hiveTableSQL.keySet()
    keys.foreach(key=>{
      val sql = HiveConfig.hiveTableSQL.get(key)
      //通过hiveContext 和已经创建好的SQL语句去创建HIVE表
      hiveContext.sql(sql)
      println(s"创建表${key}成功")
    })
  }

  /**
    * 将RDD转为DF
    * @param rdd
    * @param schemaFields
    * @param hiveContext
    * @param schema
    * @return
    */
  def rdd2DF(rdd:RDD[util.Map[String,String]],
             schemaFields: Array[String],
             hiveContext:HiveContext,
             schema:StructType): DataFrame ={

      //将RDD[Map[String,String]]转为RDD[ROW]
      val rddRow = rdd.map(recourd => {
        val listRow: util.ArrayList[Object] = new util.ArrayList[Object]()
          for (schemaField <- schemaFields) {
            listRow.add(recourd.get(schemaField))
          }
          Row.fromSeq(listRow)
          //所有分区合并成一个
      }).repartition(1)
    //构建DF
    //def createDataFrame(rowRDD: RDD[Row], schema: StructType)
    val typeDF = hiveContext.createDataFrame(rddRow, schema)
    typeDF
  }
}

4、Kafka2HiveTest 执行

spark-submit --master local[1] --num-executors 1 --driver-memory 300m --executor-memory 500m --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ‘ ‘ ‘,’) --class com.hsiehchou.spark.streaming.kafka.kafka2hdfs.Kafka2HiveTest /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar

存到hdfs中

hive查询1

5、xz_bigdata_spark/src/java/

com/hsiehchou/hdfs
HdfsAdmin.java—HDFS 文件操作类

package com.hsiehchou.hdfs;

import com.hsiehchou.common.adjuster.StringAdjuster;
import com.hsiehchou.common.file.FileCommon;
import com.google.common.base.Preconditions;
import com.google.common.collect.Lists;
import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.apache.log4j.Logger;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.lang.reflect.Array;
import java.util.Collection;
import java.util.List;

/**
 * HDFS 文件操作类
 */
public class HdfsAdmin {

    private static Logger LOG;
    private static final String HDFS_SITE = "/hadoop/hdfs-site.xml";
    private static final String CORE_SITE = "/hadoop/core-site.xml";

    private volatile static HdfsAdmin hdfsAdmin;

    private  FileSystem fs;

    private HdfsAdmin(Configuration conf, Logger logger){
        try {
            if(conf == null) conf = newConf();
            conf.set("fs.defaultFS","hdfs://hadoop1:8020");
            fs = FileSystem.get(conf);
        } catch (IOException e) {
            LOG.error("获取 hdfs的FileSystem出现异常。", e);
        }
        Preconditions.checkNotNull(fs, "没有获取到可用的Hdfs的FileSystem");
        this.LOG = logger;
        if(this.LOG == null)
            this.LOG = Logger.getLogger(HdfsAdmin.class);
    }

    private Configuration newConf(){

        Configuration conf = new Configuration();
        if(FileCommon.exist(HDFS_SITE)) conf.addResource(HDFS_SITE);
        if(FileCommon.exist(CORE_SITE)) conf.addResource(CORE_SITE);
        return conf;
    }

    public static HdfsAdmin get(){
        return get(null);
    }

    /**
     * 获取hdfsAdmin
     * @param logger
     * @return
     */
    public static HdfsAdmin get(Logger logger){
        if(hdfsAdmin == null){
            synchronized (HdfsAdmin.class){
                if(hdfsAdmin == null) hdfsAdmin = new HdfsAdmin(null, logger);
            }
        }
        return hdfsAdmin;
    }

    public static HdfsAdmin get(Configuration conf, Logger logger){
        if(hdfsAdmin == null){
            synchronized (HdfsAdmin.class){
                if(hdfsAdmin == null) hdfsAdmin = new HdfsAdmin(conf, logger);
            }
        }
        return hdfsAdmin;
    }

    public FileStatus getFileStatus(String dir) {
        FileStatus fileStatus = null;
        try {
            fileStatus = fs.getFileStatus(new Path(dir));
        } catch (IOException e) {
            LOG.error(String.format("获取文件 %s信息失败。", dir), e);
        }
        return fileStatus;
    }

    public void createFile(String dst , byte[] contents){
        //目标路径
        Path dstPath = new Path(dst);

        //打开一个输出流
        FSDataOutputStream outputStream;
        try {
            outputStream = fs.create(dstPath);
            outputStream.write(contents);
            outputStream.flush();
            outputStream.close();
        } catch (IOException e) {
            LOG.error(String.format("创建文件 %s 失败。", dst), e);
        }
        LOG.info(String.format("文件: %s 创建成功!", dst));
    }

    //上传本地文件
    public void uploadFile(String src,String dst){
        //原路径
        Path srcPath = new Path(src);

        //目标路径
        Path dstPath = new Path(dst);

        //调用文件系统的文件复制函数,前面参数是指是否删除原文件,true为删除,默认为false
        try {
            fs.copyFromLocalFile(false,srcPath, dstPath);
        } catch (IOException e) {
            LOG.error(String.format("上传文件 %s 到 %s 失败。", src, dst), e);
        }
        //打印文件路径
        LOG.info(String.format("上传文件 %s 到 %s 完成。", src, dst));
    }

    public void downloadFile(String src , String dst){
        Path dstPath = new Path(dst) ;
        try {
            fs.copyToLocalFile(false, new Path(src), dstPath);
        } catch (IOException e) {
            LOG.error(String.format("下载文件 %s 到 %s 失败。", src, dst), e);
        }
        LOG.info(String.format("下载文件 %s 到 %s 完成", src, dst));
    }

    //文件重命名
    public void rename(String oldName,String newName){

        Path oldPath = new Path(oldName);
        Path newPath = new Path(newName);
        boolean isok = false;
        try {
            isok = fs.rename(oldPath, newPath);
        } catch (IOException e) {
            LOG.error(String.format("重命名文件 %s 为 %s 失败。", oldName, newName), e);
        }
        if(isok){
            LOG.info(String.format("重命名文件 %s 为 %s 完成。", oldName, newName));
        }else{
            LOG.error(String.format("重命名文件 %s 为 %s 失败。", oldName, newName));
        }
    }

    public void delete(String path){
        delete(path, true);
    }

    //删除文件
    public void delete(String path, boolean recursive){

        Path deletePath = new Path(path);
        boolean isok = false;
        try {
            isok = fs.delete(deletePath, recursive);
        } catch (IOException e) {
            LOG.error(String.format("删除文件 %s 失败。", path), e);
        }
        if(isok){
            LOG.info(String.format("删除文件 %s 完成。", path));
        }else{
            LOG.error(String.format("删除文件 %s 失败。", path));
        }
    }

    //创建目录
    public void mkdir(String path){

        Path srcPath = new Path(path);
        boolean isok = false;
        try {
            isok = fs.mkdirs(srcPath);
        } catch (IOException e) {
            LOG.error(String.format("创建目录 %s 失败。", path), e);
        }
        if(isok){
            LOG.info(String.format("创建目录 %s 完成。", path));
        }else{
            LOG.error(String.format("创建目录 %s 失败。", path));
        }
    }

    //读取文件的内容
    public InputStream readFile(String filePath){
        Path srcPath = new Path(filePath);
        InputStream in = null;
        try {
           in = fs.open(srcPath);
        } catch (IOException e) {
            LOG.error(String.format("读取文件  %s 失败。", filePath), e);
        }
        return in;
    }

    public <T> void readFile(String filePath, StringAdjuster<T> adjuster, Collection<T> result){
        InputStream inputStream = readFile(filePath);
        if(inputStream != null){
            InputStreamReader reader = new InputStreamReader(inputStream);
            BufferedReader bufferedReader = new BufferedReader(reader);
            String line;
            try {
                T t;
                while((line = bufferedReader.readLine()) != null){
                    t = adjuster.doAdjust(line);
                    if(t != null)result.add(t);
                }
            } catch (IOException e) {
                LOG.error(String.format("利用缓冲流读取文件  %s 失败。", filePath), e);
            }finally {
                IOUtils.closeQuietly(bufferedReader);
                IOUtils.closeQuietly(reader);
                IOUtils.closeQuietly(inputStream);
            }
        }
    }

    public List<String> readLines(String filePath){
        return readLines(filePath, "UTF-8");
    }

    public  List<String> readLines(String filePath, String encoding){
        InputStream inputStream = readFile(filePath);
        List<String> lines = null;
        if(inputStream != null) {
            try {
                lines = IOUtils.readLines(inputStream, encoding);
            } catch (IOException e) {
                LOG.error(String.format("按行读取文件 %s 失败。", filePath), e);
            }finally {
                IOUtils.closeQuietly(inputStream);
            }
        }
        return lines;
    }

    public List<FileStatus> findNewFileOrDirInDir(String dir, HdfsFileFilter filter,
                                                final boolean onlyFile, final boolean onlyDir){
       return findNewFileOrDirInDir(dir, filter, onlyFile, onlyDir, false);
    }

    public List<FileStatus> findNewFileOrDirInDir(String dir, HdfsFileFilter filter,
                          final boolean onlyFile, final boolean onlyDir, boolean recursive){
        if(onlyFile && onlyDir){
            FileStatus fileStatus = getFileStatus(dir);
            if(fileStatus == null)return Lists.newArrayList();
            if(isAccepted(fileStatus,filter)){
                return Lists.newArrayList(fileStatus);
            }
            return Lists.newArrayList();
        }

       if(onlyFile){
           return findNewFileInDir(dir, filter, recursive);
       }

       if(onlyDir){
           return findNewDirInDir(dir, filter, recursive);
       }
       return Lists.newArrayList();
    }

    /**
     * 查找一个文件夹中 新建的目录
     * @param dir
     * @param filter
     * @return
     */
    public List<FileStatus> findNewDirInDir(String dir, HdfsFileFilter filter){
        return findNewDirInDir(new Path(dir), filter, false);
    }
    public List<FileStatus> findNewDirInDir(Path path, HdfsFileFilter filter){
        return findNewDirInDir(path, filter, false);
    }

    public List<FileStatus> findNewDirInDir(String dir, HdfsFileFilter filter, boolean recursive){
        return findNewDirInDir(new Path(dir), filter, recursive);
    }

    public List<FileStatus> findNewDirInDir(Path path, HdfsFileFilter filter, boolean recursive){
        FileStatus[] files = null;
        try {
            files = fs.listStatus(path);
        } catch (IOException e) {
            LOG.error(String.format("获取目录 %s下的文件列表失败。", path), e);
        }
        if(files == null)return Lists.newArrayList();

        List<FileStatus> paths = Lists.newArrayList();
        List<String> res = Lists.newArrayList();
        for(FileStatus fileStatus : files){
            if (fileStatus.isDirectory()) {
                if (isAccepted(fileStatus, filter)) {
                    paths.add(fileStatus);
                    res.add(fileStatus.getPath().toString());
                }else if(recursive){
                    paths.addAll(findNewDirInDir(fileStatus.getPath(), filter, recursive));
                }
            }
        }
        LOG.info(String.format("从目录%s 找到满足条件%s 有如下 %s 个文件: %s",
                path, filter,res.size(), res));
        return paths;
    }

    /**
     * 查找一个文件夹中 新建的文件
     * @param dir
     * @param filter
     * @return
     */
    public List<FileStatus> findNewFileInDir(String dir, HdfsFileFilter filter){
        return  findNewFileInDir(new Path(dir), filter, false);
    }

    public List<FileStatus> findNewFileInDir(String dir, HdfsFileFilter filter, boolean recursive){
        return  findNewFileInDir(new Path(dir), filter, recursive);
    }

    public List<FileStatus> findNewFileInDir(Path path, HdfsFileFilter filter){
        return  findNewFileInDir(path, filter, false);
    }

    public List<FileStatus> findNewFileInDir(Path path, HdfsFileFilter filter, boolean recursive){

        FileStatus[] files = null;
        try {
            files = fs.listStatus(path);
        } catch (IOException e) {
            LOG.error(String.format("获取目录 %s下的文件列表失败。", path), e);
        }
        if(files == null)return Lists.newArrayList();

        List<FileStatus> paths = Lists.newArrayList();
        List<String> res = Lists.newArrayList();
        for(FileStatus fileStatus : files){
            if (fileStatus.isFile()) {
                if (isAccepted(fileStatus, filter)) {
                    paths.add(fileStatus);
                    res.add(fileStatus.getPath().toString());
                }
            }else if(recursive){
                paths.addAll(findNewFileInDir(fileStatus.getPath(), filter, recursive));
            }
        }
        LOG.info(String.format("从目录%s 找到满足条件%s 有如下 %s 个文件: %s", path, filter,res.size(), res));

        return paths;
    }

    private boolean isAccepted(String file, HdfsFileFilter filter) {
        if(filter == null) return true;
        FileStatus fileStatus = getFileStatus(file);
        if(fileStatus == null)return false;
        return isAccepted(fileStatus, filter);
    }

    private boolean isAccepted(FileStatus fileStatus, HdfsFileFilter filter) {
        return  filter == null ? true : filter.filter(fileStatus);
    }

    public long getModificationTime(Path path){
        try {
            FileStatus status = fs.getFileStatus(path);
            return status.getModificationTime();
        } catch (IOException e) {
            LOG.error(String.format("获取路径 %s信息失败。", path), e);
        }
        return -1L;
    }

    public FileSystem getFs() {
        return fs;
    }

    public static void main(String[] args) throws Exception {
        // HdfsAdmin hdfsAdmin = HdfsAdmin.get();
       // hdfsAdmin.mkdir("hdfs://hdp04.ultiwill.com:8020/test1111");
        //System.out.println(hdfsAdmin.getFs().exists(new Path("hdfs://hdp04.ultiwill.com:8020/test")));
        //hdfsAdmin.delete("hdfs://hdp04.ultiwill.com:8020/test1111");
        //System.out.println("hdfsAdmin = " + );
       // List<FileStatus> status = hdfsAdmin.findNewDirInDir("hdfs://hdp04.ultiwill.com:50070/hdp", null);
        //System.out.println("status = " + status.size());
    }
}

HdfsFileFilter.java

package com.hsiehchou.hdfs;

import com.hsiehchou.common.filter.Filter;
import org.apache.hadoop.fs.FileStatus;

public abstract class HdfsFileFilter implements Filter<FileStatus> {

}

com/hsiehchou/hive
HiveConf.java

package com.hsiehchou.hive;

import org.apache.hadoop.conf.Configuration;
import org.apache.spark.SparkContext;
import org.apache.spark.sql.hive.HiveContext;

import java.util.Iterator;
import java.util.Map;

public class HiveConf {

    //private static String DEFUALT_CONFIG = "spark/hive/hive-server-config";
    private static HiveConf hiveConf;
    private static HiveContext hiveContext;

    private HiveConf(){

    }

    public static HiveConf getHiveConf(){
        if(hiveConf==null){
            synchronized (HiveConf.class){
                if(hiveConf==null){
                    hiveConf=new  HiveConf();
                }
            }
        }
        return hiveConf;
    }

    public static HiveContext getHiveContext(SparkContext sparkContext){
        if(hiveContext==null){
            synchronized (HiveConf.class){
                if(hiveContext==null){
                    hiveContext = new  HiveContext(sparkContext);
                    Configuration conf = new Configuration();
                    conf.addResource("spark/hive/hive-site.xml");
                    Iterator<Map.Entry<String, String>> iterator = conf.iterator();
                    while (iterator.hasNext()) {
                        Map.Entry<String, String> next = iterator.next();
                        hiveContext.setConf(next.getKey(), next.getValue());
                    }
                    hiveContext.setConf("spark.sql.parquet.mergeSchema", "true");
                }
            }
        }
        return hiveContext;
    }
}

6、小文件合并

scala/com/hsiehchou/spark/streaming/kafka/kafka2hdfs

CombineHdfs.scala—合并HDFS小文件任务

package com.hsiehchou.spark.streaming.kafka.kafka2hdfs

import com.hsiehchou.hdfs.HdfsAdmin
import com.hsiehchou.spark.common.SparkContextFactory
import org.apache.hadoop.fs.{FileSystem, FileUtil, Path}
import org.apache.spark.Logging
import org.apache.spark.sql.{SQLContext, SaveMode}

import scala.collection.JavaConversions._

/**
  * 合并HDFS小文件任务
  */
object CombineHdfs extends Serializable with Logging{

  def main(args: Array[String]): Unit = {
    //  val sparkContext = SparkContextFactory.newSparkBatchContext("CombineHdfs")

    val sparkContext = SparkContextFactory.newSparkLocalBatchContext("CombineHdfs")

    //创建一个 sparkSQL
    val sqlContext: SQLContext = new SQLContext(sparkContext)

    //遍历表 就是遍历HIVE表
    HiveConfig.tables.foreach(table=>{

      //获取HDFS文件目录
      //apps/hive/warehouse/external/mail类似
      //apps/hive/warehouse/external/mail
      val table_path =s"${HiveConfig.hive_root_path}$table" 

      //通过sparkSQL 加载 这些目录的文件
      val tableDF = sqlContext.read.load(table_path)

      //先获取原来数据种的所有文件  HDFS文件 API
      val fileSystem:FileSystem = HdfsAdmin.get().getFs

      //通过globStatus 获取目录下的正则匹配文件
      //fileSystem.listFiles()
      val arrayFileStatus = fileSystem.globStatus(new Path(table_path+"/part*"))

      //stat2Paths将文件状态转为文件路径   这个文件路径是用来删除的
      val paths = FileUtil.stat2Paths(arrayFileStatus)

      //写入合并文件   //repartition 需要根据生产中实际情况去定义
      tableDF.repartition(1).write.mode(SaveMode.Append).parquet(table_path)
      println("写入" + table_path +"成功")

      //删除小文件
      paths.foreach(path =>{
        HdfsAdmin.get().getFs.delete(path)
        println("删除文件" + path + "成功")
      })
    })
  }
}

7、定时任务

命令行输入:crontab -e

内容:
0 1 * * * spark-submit --master local[1] --num-executors 1 --driver-memory 300m --executor-memory 500m --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ‘ ‘ ‘,’) --class com.hsiehchou.spark.streaming.kafka.kafka2hdfs.CombineHdfs /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar

说明:
* * * * * 执行的任务

项目 含义 范围
第一个“*” 一小时当中的第几分钟(分) 0-59
第二个“*” 一天当中的第几小时(时) 0-23
第三个“*” 一个月当中的第几天(天) 1-31
第四个“*” 一年当中的第几月(月) 1-12
第五个“*” 一周当中的星期几(周) 0-7(0和7都代表星期日)

8、合并小文件截图

合并小文件

9、hive命令

show tales;

hdfs dfs -ls /apps/hive/warehouse/external

hdfs dfs -rm -r /apps/hive/warehouse/external/mail

drop table mail;

desc qq;

select * from qq limit 1;
select count(*) from qq;

/usr/bin下面的启动zookeeper客户端
zookeeper-client

删除zookeeper里面的消费者数据
rmr /consumers/WarningStreamingTask2/offsets

rmr /consumers/Kafka2HiveTest/offsets

rmr /consumers/DataRelationStreaming1/offsets

十一、Spark—Kafka2Hbase

1、数据关联

(1)为什么需要关联
问题:我们不能充分了解数据之间的关联关系。

公司中应用的非常多
离线关联,传通数据 mysql 通过关联字段去关联。
但是,如果数据量非常大,关联表非常多。处理不了。

数据零散,只能从单一维度去看数据,看的面比较窄。
如果需要从多个维度分析,关联成本比较大。

建立数据之间的关联关系,实现关联查询毫秒级响应
另一个方面,可以为数据挖掘,机器学习提供训练数据

后面进行机器学习的时候,都需要从多维度对数据进行分析和建模

(2)HBASE 只要rowkey一样,那么他们就是一条数据
QQ
aa-aa-aa-aa-aa-aa 666666

微信
aa-aa-aa-aa-aa-aa weixin

邮箱
aa-aa-aa-aa-aa-aa 666666@qq.com

(3)如何关联
一对一的情况 :
https://blog.csdn.net/shujuelin/article/details/83657485

使用HBASE写入特性
比如 MAC1 1789932321
MAC1 88888@qq.com
MAC1 88888

一对多的情况怎么处理
使用多版本
aa-aa-aa-aa-aa-aa 666666
aa-aa-aa-aa-aa-aa 777777

(4)一对多
使用多版本存一堆多的关系
多版本 插入了一个777777 一个版本
再插入一个777777 一个版本

所以需要自定义版本号 确定版本唯一
通过 “888888”.hashCode() & Integer.MAX_VALUE

(5)如果实现hbase多字段查询
往主关联表 test:relation 里面写入数据 rowkey=>aa-aa-aa-aa-aa-aa version=>1637094383 类型phone_mac value=>aa-aa-aa-aa-aa-aa
往二级索表 test:phone_mac里面写入数据 rowkey=>aa-aa-aa-aa-aa-aa version=>1736188717 value=>aa-aa-aa-aa-aa-aa

Hbase关联

查询不直接查主关联表,因为查询字段不在主键里面,没办法查或者性能非常低下。

查询是分2步rowkey查询
第一步, 通过查询字段取对应的二级索引表里面去找主关联表的ROWKEY
第二步, 通过主关联表的ROWKEY 获取HBASE中的全量数据

WIFI 已经入库的情况下,手机号也必须已经入库了,才能找到
加入WIFI的手机号还没有入库

如果是基础数据先过来 没有mac 没有主键

Card phone
400000000000000 18612345678

关联

Phone value (识别这个字段是身份证才可以)
18612345678 400000000000000

1)因为检索的时候都是通过索引表直接找MAC,混入了身份证
2)要进行一个合并

(6)关联及二级索引示意

关联及二级索引示意

Hbase关联表示意图

(7)如果使用ES建立二级索引

使用ES建立二级索引

如果hbase 里面有100个字段,存放的是全量信息,但是只有20个字段参与查询、检索,那么我们可以把这个20个字段单独提出来存放到es中,因为ES是对对字段,多条件查询非常灵活。所以我们可以先在ES中对条件进行检索,根据检索的结果拿到hbaSe的rowkey,然后再通过rowkey到hbase里面获取全量信息。

(8)Hbase 预分区
主要是根据rowkey分布来进行预分区

分区主要是为了防止热点问题

relation表为例
这个表的rowkey 是不是就是 mac

phone_mac 都是以0-9 a-f开头的
device_mac 都是以0-9 a-z开头的
Hbase 是按字典序排序

(9)自定义版本号
通过这样的一个转换我们可以精确定位数据的多版本号,,然后可以根据版本号对数据进行多版本删除。
156511 aaaaaaaa

2、DataRelationStreaming—数据关联

DataRelationStreaming.scala

package com.hsiehchou.spark.streaming.kafka.kafka2hbase

import java.util.Properties

import com.hsiehchou.common.config.ConfigUtil
import com.hsiehchou.hbase.config.HBaseTableUtil
import com.hsiehchou.hbase.insert.HBaseInsertHelper
import com.hsiehchou.hbase.spilt.SpiltRegionUtil
import com.hsiehchou.spark.common.SparkContextFactory
import com.hsiehchou.spark.streaming.kafka.Spark_Kafka_ConfigUtil
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.hbase.util.Bytes
import org.apache.spark.Logging
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.kafka.KafkaManager

object DataRelationStreaming extends Serializable with Logging{

  // 读取需要关联的配置文件字段
  // phone_mac,phone,username,send_mail,imei,imsi
  val relationFields = ConfigUtil.getInstance()
    .getProperties("spark/relation.properties")
    .get("relationfield")
    .toString
    .split(",")
  def main(args: Array[String]): Unit = {

    //初始化hbase表
    //initRelationHbaseTable(relationFields)

    val ssc = SparkContextFactory.newSparkLocalStreamingContext("DataRelationStreaming", java.lang.Long.valueOf(10),1)
    //  val ssc = SparkContextFactory.newSparkStreamingContext("DataRelationStreaming", java.lang.Long.valueOf(10))

    val kafkaConfig: Properties = ConfigUtil.getInstance().getProperties("kafka/kafka-server-config.properties")
    val topics = "chl_test7".split(",")
    val kafkaDS = new KafkaManager(Spark_Kafka_ConfigUtil
      .getKafkaParam(kafkaConfig.getProperty("metadata.broker.list"),
        "DataRelationStreaming2"))
      .createJsonToJMapStringDirectStreamWithOffset(ssc, topics.toSet)
      .persist(StorageLevel.MEMORY_AND_DISK)

    kafkaDS.foreachRDD(rdd=>{

      rdd.foreachPartition(partion=>{
        //对partion进行遍历
        while (partion.hasNext){

          //获取每一条流数据
          val map = partion.next()

          //获取mac 主键
          var phone_mac:String = map.get("phone_mac")

          //获取所有关联字段 //phone_mac,phone,username,send_mail,imei,imsi
          relationFields.foreach(relationFeild =>{
            //relationFields 是关联字段,需要进行关联处理的,所有判断
            //map中是不是包含这个字段,如果包含的话,取出来进行处理
            if(map.containsKey(relationFeild)){
              //创建主关联,并遍历关联字段进行关联
              val put = new Put(phone_mac.getBytes())

              //取关联字段的值
              //TODO  到这里  主关联表的 主键和值都有了  然后封装成PUT写入hbase主关联表就行了
              val value = map.get(relationFeild)

              //自定义版本号  通过 (表字段名 + 字段值 取hashCOde)
              //因为值有可能是字符串,但是版本号必须是long类型,所以这里我们需要
              //将字符串影射唯一数字,而且必须是正整数
              val versionNum = (relationFeild+value).hashCode() & Integer.MAX_VALUE
              put.addColumn("cf".getBytes(), Bytes.toBytes(relationFeild),versionNum ,Bytes.toBytes(value.toString))
              HBaseInsertHelper.put("test:relation",put)
              println(s"往主关联表 test:relation 里面写入数据  rowkey=>${phone_mac} version=>${versionNum} 类型${relationFeild} value=>${value}")

              // 建立二级索引
              // 使用关联字段的值最为二级索引的rowkey
              // 二级索引就是把这个字段的值作为索引表rowkey
              // 把这个字段的mac做为索引表的值
              val put_2 = new Put(value.getBytes())//把这个字段的值作为索引表rowkey
              val table_name = s"test:${relationFeild}"//往索引表里面取写
              //使用主表的rowkey  就是 取hash作为二级索引的版本号
              val versionNum_2 = phone_mac.hashCode() & Integer.MAX_VALUE
              put_2.addColumn("cf".getBytes(), Bytes.toBytes("phone_mac"),versionNum_2 ,Bytes.toBytes(phone_mac.toString))
              HBaseInsertHelper.put(table_name,put_2)
              println(s"往二级索表 ${table_name}里面写入数据  rowkey=>${value} version=>${versionNum_2} value=>${phone_mac}")
            }
          })
        }
      })
    })
    ssc.start()
    ssc.awaitTermination()
  }

  def initRelationHbaseTable(relationFields:Array[String]): Unit ={
    //初始化总关联表
    val relation_table = "test:relation"
    HBaseTableUtil.createTable(relation_table,
      "cf",
      true,
      -1,
      100,
      SpiltRegionUtil.getSplitKeysBydinct)
    //HBaseTableUtil.deleteTable(relation_table)

    //遍历所有关联字段,根据字段创建二级索引表
    relationFields.foreach(field=>{
      val hbase_table = s"test:${field}"
      HBaseTableUtil.createTable(hbase_table, "cf", true, -1, 100, SpiltRegionUtil.getSplitKeysBydinct)
      // HBaseTableUtil.deleteTable(hbase_table)
    })
  }
}

3、com.hsiehchou.spark.streaming

common/SparkContextFactory.scala

package com.hsiehchou.spark.common

import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{Accumulator, SparkContext}

object SparkContextFactory {

  def newSparkBatchContext(appName:String = "sparkBatch") : SparkContext = {
    val sparkConf = SparkConfFactory.newSparkBatchConf(appName)
    new SparkContext(sparkConf)
  }

  def newSparkLocalBatchContext(appName:String = "sparkLocalBatch" , threads : Int = 2) : SparkContext = {
    val sparkConf = SparkConfFactory.newSparkLoalConf(appName, threads)
    sparkConf.set("","")
    new SparkContext(sparkConf)
  }

  def getAccumulator(appName:String = "sparkBatch") : Accumulator[Int] = {
    val sparkConf = SparkConfFactory.newSparkBatchConf(appName)
    val accumulator: Accumulator[Int] = new SparkContext(sparkConf).accumulator(0,"")
    accumulator
  }

  /**
    * 创建本地流streamingContext
    * @param appName             appName
    * @param batchInterval      多少秒读取一次
    * @param threads            开启多少个线程
    * @return
    */
  def newSparkLocalStreamingContext(appName:String = "sparkStreaming" ,
                                    batchInterval:Long = 30L ,
                                    threads : Int = 4) : StreamingContext = {
    val sparkConf =  SparkConfFactory.newSparkLocalConf(appName, threads)
    // sparkConf.set("spark.streaming.receiver.maxRate","10000")
    sparkConf.set("spark.streaming.kafka.maxRatePerPartition","1")
    new StreamingContext(sparkConf, Seconds(batchInterval))
  }

  /**
    * 创建集群模式streamingContext
    * 这里不设置线程数,在submit中指定
    * @param appName
    * @param batchInterval
    * @return
    */
  def newSparkStreamingContext(appName:String = "sparkStreaming" , batchInterval:Long = 30L) : StreamingContext = {
    val sparkConf = SparkConfFactory.newSparkStreamingConf(appName)
    new StreamingContext(sparkConf, Seconds(batchInterval))
  }

  def startSparkStreaming(ssc:StreamingContext){
    ssc.start()
      ssc.awaitTermination()
      ssc.stop()
  }
}

streaming/kafka/Spark_Kafka_ConfigUtil.scala

package com.hsiehchou.spark.streaming.kafka

import org.apache.spark.Logging

object Spark_Kafka_ConfigUtil extends Serializable with Logging{

  def getKafkaParam(brokerList:String,groupId : String): Map[String,String]={
    val kafkaParam=Map[String,String](
      "metadata.broker.list" -> brokerList,
      "auto.offset.reset" -> "smallest",
      "group.id" -> groupId,
      "refresh.leader.backoff.ms" -> "1000",
      "num.consumer.fetchers" -> "8")
    kafkaParam
  }
}

4、com/hsiehchou/common/config/ConfigUtil

ConfigUtil.java

package com.hsiehchou.common.config;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;

public class ConfigUtil {

    private static Logger LOG = LoggerFactory.getLogger(ConfigUtil.class);

    private static ConfigUtil configUtil;

    public static ConfigUtil getInstance(){

        if(configUtil == null){
            configUtil = new ConfigUtil();
        }
        return configUtil;
    }

    public Properties getProperties(String path){
        Properties properties = new Properties();
        try {
            LOG.info("开始加载配置文件" + path);
            InputStream insss = this.getClass().getClassLoader().getResourceAsStream(path);
            properties = new Properties();
            properties.load(insss);
        } catch (IOException e) {
            LOG.info("加载配置文件" + path + "失败");
            LOG.error(null,e);
        }

        LOG.info("加载配置文件" + path + "成功");
        System.out.println("文件内容:"+properties);
        return properties;
    }

    public static void main(String[] args) {
        ConfigUtil instance = ConfigUtil.getInstance();
        Properties properties = instance.getProperties("common/datatype.properties");
        //Properties properties = instance.getProperties("spark/relation.properties");

       // properties.get("relationfield");
        System.out.println(properties);
    }
}

5、构建模块—xz_bigdata_hbase

pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata2</artifactId>
        <groupId>com.hsiehchou</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_hbase</artifactId>

    <name>xz_bigdata_hbase</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <hbase.version>1.2.0</hbase.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_resources</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>${hbase.version}-${cdh.version}</version>
            <exclusions>
                <exclusion>
                    <artifactId>guava</artifactId>
                    <groupId>com.google.guava</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>zookeeper</artifactId>
                    <groupId>org.apache.zookeeper</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-server</artifactId>
            <version>${hbase.version}-${cdh.version}</version>
            <exclusions>
                <exclusion>
                    <artifactId>servlet-api-2.5</artifactId>
                    <groupId>org.mortbay.jetty</groupId>
                </exclusion>
            </exclusions>
        </dependency>
    </dependencies>

</project>

com/hsiehchou/hbase/config/HBaseConf.java

package com.hsiehchou.hbase.config;

import com.hsiehchou.hbase.spilt.SpiltRegionUtil;
import org.apache.commons.configuration.CompositeConfiguration;
import org.apache.commons.configuration.ConfigurationException;
import org.apache.commons.configuration.PropertiesConfiguration;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.BufferedMutator;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.log4j.Logger;

import java.io.IOException;
import java.io.Serializable;

public class HBaseConf implements Serializable {

    private static final long serialVersionUID = 1L;
    private static final Logger LOG = Logger.getLogger(HBaseConf.class);

    private static final String HBASE_SERVER_CONFIG = "hbase/hbase-server-config.properties";
    private static final String HBASE_SITE = "hbase/hbase-site.xml";

    private volatile static HBaseConf hbaseConf;
    private CompositeConfiguration hbase_server_config;

    public CompositeConfiguration getHbase_server_config() {

        return hbase_server_config;
    }

    public void setHbase_server_config(CompositeConfiguration hbase_server_config) {
        this.hbase_server_config = hbase_server_config;
    }

    //hbase 配置文件
    private  Configuration configuration;
    //hbase 连接
    private volatile transient Connection conn;

    /**
     * 初始化HBaseConf的时候加载配置文件
     */
    private HBaseConf() {
        hbase_server_config = new CompositeConfiguration();
        //加载配置文件
        loadConfig(HBASE_SERVER_CONFIG,hbase_server_config);
        //初始化连接
        getHconnection();
    }

    //获取连接
    public Configuration getConfiguration(){
        if(configuration==null){
            configuration = HBaseConfiguration.create();
            configuration.addResource(HBASE_SITE);
            LOG.info("加载配置文件" + HBASE_SITE + "成功");
        }
        return configuration;
    }

    public BufferedMutator getBufferedMutator(String tableName) throws IOException {
        return getHconnection().getBufferedMutator(TableName.valueOf(tableName));
    }

    public Connection getHconnection(){

        if(conn==null){
            //获取配置文件
            getConfiguration();
            synchronized (HBaseConf.class) {
                if (conn == null) {
                    try {
                        conn = ConnectionFactory.createConnection(configuration);
                    } catch (IOException e) {
                        LOG.error(String.format("获取hbase的连接失败  参数为: %s", toString()), e);
                    }
                }
            }
        }
        return conn;
    }

    /**
     * 加载配置文件
     * @param path
     * @param configuration
     */
    private void loadConfig(String path,CompositeConfiguration configuration) {
        try {
            LOG.info("加载配置文件 " + path);
            configuration.addConfiguration(new PropertiesConfiguration(path));
            LOG.info("加载配置文件" + path +"成功。 ");
        } catch (ConfigurationException e) {
            LOG.error("加载配置文件 " + path + "失败", e);
        }
    }

    /**
     * 单例 初始化HBaseConf
     * @return
     */
    public static HBaseConf getInstance() {
        if (hbaseConf == null) {
            synchronized (HBaseConf.class) {
                if (hbaseConf == null) {
                    hbaseConf = new HBaseConf();
                }
            }
        }
        return hbaseConf;
    }

    public static void main(String[] args) {
        String hbase_table = "test:chl_test2";
        HBaseTableUtil.createTable(hbase_table, "cf", true, -1, 1, SpiltRegionUtil.getSplitKeysBydinct());

      /*  Connection hconnection = HBaseConf.getInstance().getHconnection();
        Connection hconnection1 = HBaseConf.getInstance().getHconnection();
        System.out.println(hconnection);
        System.out.println(hconnection1);*/
    }
}

com/hsiehchou/hbase/config/HBaseTableFactory.java

package com.hsiehchou.hbase.config;

import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.BufferedMutator;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Table;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.io.Serializable;

public class HBaseTableFactory implements Serializable {

    private static final long serialVersionUID = -1071596337076137201L;

    private static final Logger LOG = LoggerFactory.getLogger(HBaseTableFactory.class);

    private HBaseConf conf;
    private transient Connection conn  ;
    private boolean isReady = true;

    public HBaseTableFactory(){

        conf = HBaseConf.getInstance();
        if(true){
            conn = conf.getHconnection();
        }else{
            isReady = false;
            LOG.warn("HBase 连接没有启动。");
        }
    }

    public HBaseTableFactory(Connection conn){
        this.conn = conn;
    }

    /**
      * 根据表名创建 表的实例
      * @param tableName
      * @return
      * @throws IOException
      * HTableInterface
     */
    public Table getHBaseTableInstance(String tableName) throws IOException{

        if(conn == null){
            if(conf == null){
                conf = HBaseConf.getInstance();
                isReady = true;
                LOG.warn("HBaseConf为空,重新初始化。");
            }
            synchronized (HBaseTableFactory.class) {
                if(conn == null) {
                    conn = conf.getHconnection();
                    LOG.warn("初始 hbase Connection 为空 , 获取  Connection成功。");
                }
            }
        }
        return  isReady ? conn.getTable(TableName.valueOf(tableName)) : null;
    }

    public HTable getHTable(String tableName) throws IOException{

        return  (HTable) getHBaseTableInstance(tableName);
    }

    public BufferedMutator getBufferedMutator(String tableName) throws IOException {
        return getConf().getBufferedMutator(tableName);
    }

    public boolean isReady() {
        return isReady;
    }

    private HBaseConf getConf(){
        if(conf == null){
            conf = HBaseConf.getInstance();
        }
        return conf;
    }

    public void close() throws IOException{
        conn.close();
        conn = null;
    }
}

com/hsiehchou/hbase/config/HBaseTableUtil

package com.hsiehchou.hbase.config;

import com.google.common.collect.Sets;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.io.compress.Compression;
import org.apache.hadoop.hbase.io.encoding.DataBlockEncoding;
import org.apache.hadoop.hbase.regionserver.BloomType;
import org.apache.hadoop.hbase.util.Bytes;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.util.*;

import static com.google.common.base.Preconditions.checkArgument;

public class HBaseTableUtil {

    private static final Logger LOG = LoggerFactory.getLogger(HBaseTableUtil.class);
    private static final String COPROCESSORCLASSNAME =  "org.apache.hadoop.hbase.coprocessor.AggregateImplementation";
    private static HBaseConf conf = HBaseConf.getInstance() ;

    private HBaseTableUtil(){}

    /**
     * 获取hbase 表连接
     * @param tableName
     * @return
     */
    public static Table getTable(String tableName){
        Table table =null;
        if(tableExists(tableName)){
            try {
                table = conf.getHconnection().getTable(TableName.valueOf(tableName));
            } catch (IOException e) {
                LOG.error(null,e);
            }
        }
        return table;
    }

    public static void close(Table table){
        if(table != null) {
            try {
                table.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

    /**
     * 判断   HBase中是否存在  名为  tableName 的表
     * @param tableName
     * @return  boolean
     */
    public static boolean tableExists(String tableName){

        boolean  isExists = false;
        try {
            isExists = conf.getHconnection().getAdmin().tableExists(TableName.valueOf(tableName));
        } catch (MasterNotRunningException e) {
            LOG.error("HBase  master  未运行 。 ", e);
        } catch (ZooKeeperConnectionException e) {
            LOG.error("zooKeeper 连接异常。 ", e);
        } catch (IOException e) {
            LOG.error("", e);
        }
        return isExists;
    }

    /**
     * 删除表
     * @param tableName
     * @return
     */
    public static boolean deleteTable(String tableName){

        boolean status = false;
        TableName name = TableName.valueOf(tableName);
        try {
            Admin admin = conf.getHconnection().getAdmin();
            if(admin.tableExists(name)){
                if(!admin.isTableDisabled(name)){
                    admin.disableTable(name);
                }
                admin.deleteTable(name);
            }else{
                LOG.warn(" HBase中不存在 表 " + tableName);
            }
            admin.close();
            status = true;
        } catch (MasterNotRunningException e) {
            LOG.error("HBase  master  未运行 。 ", e);
        } catch (ZooKeeperConnectionException e) {
            LOG.error("zooKeeper 连接异常。 ", e);
        } catch (IOException e) {
            LOG.error("", e);
        }
        return status;
    }

    /**
     * 清空表
     * @param tableName
     * @return
     */
    public static boolean truncateTable(String tableName){

        boolean status = false;
        TableName name = TableName.valueOf(tableName);

        try {
            Admin admin = conf.getHconnection().getAdmin();
            if(admin.tableExists(name)){
                if(admin.isTableAvailable(name)){
                    admin.disableTable(name);
                }
                admin.truncateTable(name, true);
            }else{
                LOG.warn(" HBase中不存在 表 " + tableName);
            }
            admin.close();
            status = true;
        } catch (MasterNotRunningException e) {
            LOG.error("HBase  master  未运行 。 ", e);
        } catch (ZooKeeperConnectionException e) {
            LOG.error("zooKeeper 连接异常。 ", e);
        } catch (IOException e) {
            LOG.error("", e);
        }
        return status;
    }

    /**
     * 创建HBase表
     * @param tableName
     * @param cf       列族名
     * @param inMemory
     * @param ttl    ttl < 0     则为永久保存
     */
    public static boolean createTable(String tableName, String cf, boolean inMemory, int ttl, int maxVersion){

        HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, COPROCESSORCLASSNAME);

        return createTable(htd);
    }

    public static boolean createTable(String tableName, String cf, boolean inMemory, int ttl, int maxVersion,  boolean useSNAPPY){

        HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, useSNAPPY , COPROCESSORCLASSNAME);

        return createTable(htd);
    }

    public static boolean createTable(String tableName, String cf, boolean inMemory, int ttl, int maxVersion,  boolean useSNAPPY, byte[][] splits){

        HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, useSNAPPY, COPROCESSORCLASSNAME);
        return createTable(htd , splits);

    }

    /**
     * @param tableName    表名
     * @param cf           列簇
     * @param inMemory     是否存在内存
     * @param ttl          数据过期时间
     * @param maxVersion   最大版本
     * @param splits       分区
     * @return
     */
    public static boolean createTable(String tableName,
                                      String cf,
                                      boolean inMemory,
                                      int ttl,
                                      int maxVersion,
                                      byte[][] splits){
        //返回表说明
        HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, COPROCESSORCLASSNAME);
        //通过HTableDescriptor 和 splits 分区策略来定义表
        return createTable(htd , splits);
    }

    public static List<String> listTables(){

        List<String> list = new ArrayList<String>();
        Admin admin = null;

        try {
            admin = conf.getHconnection().getAdmin();
            TableName[] listTableNames = admin.listTableNames();
            for( TableName t :  listTableNames ){
                list.add( t.getNameAsString() );
            }
        } catch(IOException e )  {
            LOG.error("创建HBase表失败。", e);
        }finally{
            try {
                if(admin!=null){
                    admin.close();
                }
            } catch (IOException e) {
                LOG.error("", e);
            }
        }
        return list;
    }

    /**
     * 列出所有表
     * @param reg
     * @return
     */
    public static List<String> listTables(String reg){
        List<String> list = new ArrayList<String>();
        Admin admin = null;

        try {
            admin = conf.getHconnection().getAdmin();
            TableName[] listTableNames = admin.listTableNames(reg);
            for(TableName t :  listTableNames){
                list.add(t.getNameAsString());
            }
        } catch(IOException e)  {
            LOG.error("创建HBase表失败。", e);
        }finally{
            try {
                if(admin!=null){
                    admin.close();
                }
            } catch (IOException e) {
                LOG.error("", e);
            }
        }
        return list;
    }

    /**
     * 创建HBase表
     * @param tableName
     * @param cf       列族名
     * @param inMemory
     * @param ttl      ttl < 0     则为永久保存
     */
    public static boolean  createTable(String tableName, String cf, boolean inMemory, int ttl , int maxVersion, String ... coprocessorClassNames){
        HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, coprocessorClassNames);
        return createTable(htd);
    }

    public static boolean  createTable( String tableName, String cf, boolean inMemory, int ttl, int maxVersion, boolean useSNAPPY, String ... coprocessorClassNames){
        HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, useSNAPPY, coprocessorClassNames);
        return createTable(htd);
    }

    public static boolean  createTable( String tableName,String cf,boolean inMemory, int ttl ,int maxVersion ,  boolean useSNAPPY ,byte[][] splits, String ... coprocessorClassNames){
        HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, useSNAPPY ,coprocessorClassNames);
        return createTable(htd,splits );
    }
    public static boolean  createTable(String tableName, String cf, boolean inMemory, int ttl, int maxVersion, byte[][] splits, String ... coprocessorClassNames){

        HTableDescriptor htd = createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, coprocessorClassNames);
        return createTable(htd,splits );
    }

    /**
     * 通过HTableDescriptor 和 分区 来构建hbase
     * @param htd
     * @param splits
     * @return
     */
    public static boolean createTable(HTableDescriptor htd, byte[][] splits){
        Admin admin = null;
        try {
            admin = conf.getHconnection().getAdmin();
            TableName tableName = htd.getTableName();
            boolean exist = admin.tableExists(tableName);
            if(exist){
                LOG.error("表"+tableName.getNameAsString() + "已经存在");
            }else{
                //使用Admin进行创建表
                admin.createTable(htd, splits);
            }
        } catch(IOException e )  {
            LOG.error("创建HBase表失败。", e);
            return false;
        }finally{
            try {
                if(admin!=null){
                    admin.close();
                }
            } catch (IOException e) {
                LOG.error("", e);
            }
        }
        return true;
    }

    public static boolean createTable(HTableDescriptor htd){
        Admin admin = null;
        try {
            admin = conf.getHconnection().getAdmin();
            if(admin.tableExists(htd.getTableName())){
                LOG.info("表" + htd.getTableName() + "已经存在");
            }else{
                admin.createTable(htd);
            }
        } catch(IOException e )  {
            LOG.error("创建HBase表失败。", e);
            return false;
        }finally{
            try {
                if(admin!=null){
                    admin.close();
                }
            } catch (IOException e) {
                LOG.error("", e);
            }
        }
        return true;
    }

    /**
     * 创建命名空间
     * @param nameSpace
     * @return
     */
    public static boolean createNameSpace(String nameSpace){

        Admin admin = null;
        try {
            admin = conf.getHconnection().getAdmin();
            NamespaceDescriptor[] listNamespaceDescriptors = admin.listNamespaceDescriptors();
            boolean exist = false;
            for(NamespaceDescriptor namespaceDescriptor : listNamespaceDescriptors){
                if(namespaceDescriptor.getName().equals(nameSpace)){
                    exist = true;
                }
            }
            if(!exist) admin.createNamespace(NamespaceDescriptor.create(nameSpace).build());
        } catch(IOException e )  {
            LOG.error("创建HBase命名空间失败。", e);
            return false;
        }finally{
            try {
                if(admin!=null){
                    admin.close();
                }
            } catch (IOException e) {
                LOG.error("", e);
            }
        }
        return true;
    }

    /**
     * 为 HBase中的表  tableName添加 协处理器  coprocessorClassName
     * @param tableName
     * @param coprocessorClassName    必须是已经存在与HBase集群中
     * @return  boolean
     */
    public static boolean addCoprocessorClassForTable(String tableName,String coprocessorClassName){

        boolean status = false;
        TableName name = TableName.valueOf(tableName);
        Admin admin = null;
        try {
            admin = conf.getHconnection().getAdmin();
            HTableDescriptor htd = admin.getTableDescriptor(name);
            if(!htd.hasCoprocessor(coprocessorClassName)){

                htd.addCoprocessor(coprocessorClassName);

                admin.disableTable(name);
                admin.modifyTable(name, htd);
                admin.enableTable(name);
            }else{
                LOG.warn(String.format("表 %s中已经存在协处理器%s", tableName, coprocessorClassName));
            }
            status = true;
        } catch (MasterNotRunningException e) {
            LOG.error("HBase  master  未运行 。 ", e);
        } catch (ZooKeeperConnectionException e) {
            LOG.error("zooKeeper 连接异常。 ", e);
        } catch (IOException e) {
            LOG.error("", e);
        }finally{
            try {
                if(admin!=null){
                    admin.close();
                }
            } catch (IOException e) {
                LOG.error("", e);
            }
        }
        return status;
    }

    /**
     * 为HBase中的表 tableName添加指定位置的 协处理器 jar
     * @param tableName
     * @param coprocessorClassName   jar中的具体的协处理器
     * @param jarPath     hdfs的路径
     * @param level       执行级别
     * @param kvs         运行参数    可以为 null
     * @return   boolean
     */
    public static boolean addCoprocessorJarForTable(String  tableName, String coprocessorClassName,String jarPath,int level ,Map<String, String> kvs ){
        boolean status = false;
        TableName name = TableName.valueOf(tableName);
        Admin admin = null;
        try {
            admin = conf.getHconnection().getAdmin();
            HTableDescriptor htd = admin.getTableDescriptor(name);
            if(!htd.hasCoprocessor(coprocessorClassName)){
                admin.disableTable(name);
                htd.addCoprocessor(coprocessorClassName, new Path(jarPath), level, kvs);
                admin.modifyTable(name, htd);
                admin.enableTable(name);
            }else{
                LOG.warn(String.format("表 %s中已经存在协处理器%s", tableName, coprocessorClassName));
            }
            status = true;
        } catch (MasterNotRunningException e) {
            LOG.error("HBase  master  未运行 。 ", e);
        } catch (ZooKeeperConnectionException e) {
            LOG.error("zooKeeper 连接异常。 ", e);
        } catch (IOException e) {
            LOG.error("", e);
        }finally{
            try {
                if(admin!=null){
                    admin.close();
                }
            } catch (IOException e) {
                LOG.error("", e);
            }
        }
        return status;
    }

    /**
     * @param tableName
     * @param cf
     * @param inMemory
     * @param ttl
     * @param maxVersion
     * @param coprocessorClassNames
     * @return
     */
    public static HTableDescriptor createHTableDescriptor( String tableName,String cf,boolean inMemory, int ttl ,int maxVersion ,String ... coprocessorClassNames ){
        return createHTableDescriptor(tableName, cf, inMemory, ttl, maxVersion, true , COPROCESSORCLASSNAME);
    }

    /**
     * @param tableName
     * @param cf
     * @param inMemory
     * @param ttl
     * @param maxVersion
     * @param useSNAPPY
     * @param coprocessorClassNames
     * @return
     */
    public static HTableDescriptor createHTableDescriptor( String tableName,String cf,boolean inMemory, int ttl ,int maxVersion , boolean useSNAPPY , String ... coprocessorClassNames ){

        // 1.创建命名空间
        String[] split = tableName.split(":");
        if(split.length==2){
            createNameSpace(split[0]);
        }

        // 2.添加协处理器
        HTableDescriptor htd = new HTableDescriptor(TableName.valueOf(tableName));
        for( String coprocessorClassName : coprocessorClassNames ){

            try {
                htd.addCoprocessor(coprocessorClassName);
            } catch (IOException e1) {
                LOG.error("为表" + tableName + " 添加协处理器失败。 ", e1);
            }
        }

        // 创建HColumnDescriptor
        HColumnDescriptor hcd = new HColumnDescriptor(cf);
        if( maxVersion > 0 )
            //定义最大版本号
            hcd.setMaxVersions(maxVersion);

        /**
         * 设置布隆过滤器
         * 默认是NONE 是否使用布隆过虑及使用何种方式
         * 布隆过滤可以每列族单独启用
         * Default = ROW 对行进行布隆过滤。
         * 对 ROW,行键的哈希在每次插入行时将被添加到布隆。
         * 对 ROWCOL,行键 + 列族 + 列族修饰的哈希将在每次插入行时添加到布隆
         * 使用方法: create ‘table’,{BLOOMFILTER =>’ROW’}
         * 启用布隆过滤可以节省读磁盘过程,可以有助于降低读取延迟
         * */
        hcd.setBloomFilterType(BloomType.ROWCOL);

        /**
         * hbase在LRU缓存基础之上采用了分层设计,整个blockcache分成了三个部分,分别是single、multi和inMemory。三者区别如下:
         * single:如果一个block第一次被访问,放在该优先队列中;
         * multi:如果一个block被多次访问,则从single队列转移到multi队列
         * inMemory:优先级最高,常驻cache,因此一般只有hbase系统的元数据,如meta表之类的才会放到inMemory队列中。普通的hbase列族也可以指定IN_MEMORY属性,方法如下:
         * create 'table', {NAME => 'f', IN_MEMORY => true}
         * 修改上表的inmemory属性,方法如下:
         * alter 'table',{NAME=>'f',IN_MEMORY=>true}
         * */
        hcd.setInMemory(inMemory);
        hcd.setScope(1);

        /**
         * 数据量大,边压边写也会提升性能的,毕竟IO是大数据的最严重的瓶颈,
         * 哪怕使用了SSD也是一样。众多的压缩方式中,推荐使用SNAPPY。从压缩率和压缩速度来看,
         * 性价比最高。
         **/
        if(useSNAPPY)hcd.setCompressionType(Compression.Algorithm.SNAPPY);

        //默认为NONE
        //如果数据存储时设置了编码, 在缓存到内存中的时候是不会解码的,这样和不编码的情况相比,相同的数据块,编码后占用的内存更小, 即提高了内存的使用率
        //如果设置了编码,用户必须在取数据的时候进行解码, 因此在内存充足的情况下会降低读写性能。
        //在任何情况下开启PREFIX_TREE编码都是安全的
        //不要同时开启PREFIX_TREE和SNAPPY
        //通常情况下 SNAPPY并不能比 PREFIX_TREE取得更好的优化效果
        //hcd.setDataBlockEncoding(DataBlockEncoding.PREFIX_TREE);

        //默认为64k     65536
        //随着blocksize的增大, 系统随机读的吞吐量不断的降低,延迟也不断的增大,
        //64k大小比16k大小的吞吐量大约下降13%,延迟增大13%
        //128k大小比64k大小的吞吐量大约下降22%,延迟增大27%
        //对于随机读取为主的业务,可以考虑调低blocksize的大小

        //随着blocksize的增大, scan的吞吐量不断的增大,延迟也不断降低,
        //64k大小比16k大小的吞吐量大约增加33%,延迟降低24%
        //128k大小比64k大小的吞吐量大约增加7%,延迟降低7%
        //对于scan为主的业务,可以考虑调大blocksize的大小

        //如果业务请求以Get为主,则可以适当的减小blocksize的大小
        //如果业务是以scan请求为主,则可以适当的增大blocksize的大小
        //系统默认为64k, 是一个scan和get之间取的平衡值
        //hcd.setBlocksize(s)

        //设置表中数据的存储生命期,过期数据将自动被删除,
        // 例如如果只需要存储最近两天的数据,
        // 那么可以设置setTimeToLive(2 * 24 * 60 * 60)
        if( ttl < 0 ) ttl = HConstants.FOREVER;
        hcd.setTimeToLive(ttl);

        htd.addFamily( hcd);

        return htd;
    }

    public static boolean createTable(HBaseTableParam param){

        String nameSpace = param.getNameSpace();
        if(!"default".equalsIgnoreCase(nameSpace)){
            checkArgument(createNameSpace(nameSpace), String.format("创建命名空间%s失败。", nameSpace));
        }

        HTableDescriptor desc = createHTableDescriptor(param);
        byte[][] splits = param.getSplits();
        if(splits == null){
            return createTable(desc);
        }else{
            return createTable(desc, splits);
        }

    }

    public static HTableDescriptor createHTableDescriptor(HBaseTableParam param){

        String tableName = String.format("%s:%s", param.getNameSpace(), param.getTableName());
        HTableDescriptor htd = new HTableDescriptor(TableName.valueOf(tableName));

        for(String coprocessorClassName : param.getCoprocessorClazz()){
            try {
                htd.addCoprocessor(coprocessorClassName);
            } catch (IOException e) {
                LOG.error(String.format("为表  %s 添加协处理器失败。", tableName), e);
            }
        }

        HColumnDescriptor hcd = new HColumnDescriptor(param.getCf());
        hcd.setBloomFilterType(param.getBloomType());
        hcd.setMaxVersions(param.getMaxVersions());
        hcd.setScope(param.getReplicationScope());
        hcd.setBlocksize(param.getBlocksize());
        hcd.setInMemory(param.isInMemory());
        hcd.setTimeToLive(param.getTtl());

        /* 数据量大,边压边写也会提升性能的,毕竟IO是大数据的最严重的瓶颈,哪怕使用了SSD也是一样。众多的压缩方式中,推荐使用SNAPPY。从压缩率和压缩速度来看,性价比最高。  */
        if(param.isUsePrefix_tree())hcd.setDataBlockEncoding(DataBlockEncoding.PREFIX_TREE);
        if(param.isUseSnappy())hcd.setCompressionType(Compression.Algorithm.SNAPPY);

        htd.addFamily( hcd);

        return htd;
    }

    public static void closeTable( Table table ){

        if( table != null ){
            try {
                table.close();
            } catch (IOException e) {
                LOG.error(" ", e);
            }
            table = null;
        }
    }

    public static byte[][] getSplitKeys() {
        //String[] keys = new String[]{"50|"};
        //String[] keys = new String[]{"25|","50|","75|"};
        //String[] keys = new String[]{"13|","26|","39|", "52|","65|","78|","90|"};
        String[] keys = new String[]{ "06|","13|","20|", "26|","33|", "39|","46|", "52|","58|", "65|","72|","78|", "84|","90|","95|"};
        //String[] keys = new String[]{"10|", "20|", "30|", "40|", "50|", "60|", "70|", "80|", "90|"};
        byte[][] splitKeys = new byte[keys.length][];
        TreeSet<byte[]> rows = new TreeSet<byte[]>(Bytes.BYTES_COMPARATOR);//升序排序
        for (int i = 0; i < keys.length; i++) {
            rows.add(Bytes.toBytes(keys[i]));
        }
        Iterator<byte[]> rowKeyIter = rows.iterator();
        int i = 0;
        while (rowKeyIter.hasNext()) {
            byte[] tempRow = rowKeyIter.next();
            rowKeyIter.remove();
            splitKeys[i] = tempRow;
            i++;
        }
        return splitKeys;
    }

    public static class HBaseTableParam{

        private final String nameSpace; //命名空间
        private final String tableName; //表名
        private final String cf;        //列簇
        private Set<String>  coprocessorClazz = Sets.newHashSet("org.apache.hadoop.hbase.coprocessor.AggregateImplementation");
        private int maxVersions = 1;    //版本号 默认为1
        private BloomType bloomType = BloomType.ROWCOL;
        private boolean inMemory = false;
        private int replicationScope = 1;
        private boolean useSnappy = false; //默认不使用压缩
        private boolean usePrefix_tree = false;
        private int blocksize = 65536;
        private int ttl = HConstants.FOREVER;

        private byte[][] splits;

        public HBaseTableParam(String nameSpace, String tableName, String cf) {
            super();
            this.nameSpace = nameSpace == null ? "default" : nameSpace;
            this.tableName = tableName;
            this.cf = cf;
        }

        public String getNameSpace() {
            return nameSpace;
        }

        public String getTableName() {
            return tableName;
        }

        public String getCf() {
            return cf;
        }

        public Set<String> getCoprocessorClazz() {
            return coprocessorClazz;
        }

        public void clearCoprocessor(){
            coprocessorClazz.clear();
        }
        public void addCoprocessorClazz(String clazz) {
            this.coprocessorClazz.add(clazz);
        }

        public void addCoprocessorClazz(String ... clazz) {
            addCoprocessorClazz(Arrays.asList(clazz));
        }

        public void addCoprocessorClazz(Collection<String>  clazz) {
            this.coprocessorClazz.addAll(clazz);
        }

        public int getMaxVersions() {
            return maxVersions;
        }

        public void setMaxVersions(int maxVersions) {
            this.maxVersions = maxVersions <= 0 ? 1 : maxVersions;
        }

        public BloomType getBloomType() {
            return bloomType;
        }

        public void setBloomType(BloomType bloomType) {
            this.bloomType = bloomType == null ? BloomType.ROWCOL : bloomType;
        }

        public boolean isInMemory() {
            return inMemory;
        }

        public void setInMemory(boolean inMemory) {
            this.inMemory = inMemory;
        }

        public int getReplicationScope() {
            return replicationScope;
        }

        public void setReplicationScope(int replicationScope) {
            this.replicationScope = replicationScope < 0 ? 1 : replicationScope;
        }

        public boolean isUseSnappy() {
            return useSnappy;
        }

        /**
         * 控制是否使用 snappy 压缩数据, 默认是不启用
         * @param useSnappy
         */
        public void setUseSnappy(boolean useSnappy) {
            this.useSnappy = useSnappy;
        }

        public boolean isUsePrefix_tree() {
            return usePrefix_tree;
        }

        /**
         * 控制是否使用数据编码,默认是不使用
         *
         * 如果数据存储时设置了编码, 在缓存到内存中的时候是不会解码的,这样和不编码的情况相比,相同的数据块,编码后占用的内存更小, 即提高了内存的使用率
         * 如果设置了编码,用户必须在取数据的时候进行解码, 因此在内存充足的情况下会降低读写性能。
         * 在任何情况下开启PREFIX_TREE编码都是安全的
         * 不要同时开启PREFIX_TREE和SNAPPY
         * 通常情况下 SNAPPY并不能比 PREFIX_TREE取得更好的优化效果
         */
        public void setUsePrefix_tree(boolean usePrefix_tree) {
            this.usePrefix_tree = usePrefix_tree;
        }

        public int getBlocksize() {
            return blocksize;
        }

        /**
         *默认为64k     65536
         *随着blocksize的增大, 系统随机读的吞吐量不断的降低,延迟也不断的增大,
         *64k大小比16k大小的吞吐量大约下降13%,延迟增大13%
         *128k大小比64k大小的吞吐量大约下降22%,延迟增大27%
         *对于随机读取为主的业务,可以考虑调低blocksize的大小
         *
         *随着blocksize的增大, scan的吞吐量不断的增大,延迟也不断降低,
         *64k大小比16k大小的吞吐量大约增加33%,延迟降低24%
         *128k大小比64k大小的吞吐量大约增加7%,延迟降低7%
         *对于scan为主的业务,可以考虑调大blocksize的大小
         *
         *如果业务请求以Get为主,则可以适当的减小blocksize的大小
         *如果业务是以scan请求为主,则可以适当的增大blocksize的大小
         *系统默认为64k, 是一个scan和get之间取的平衡值
         *
         */
        public void setBlocksize(int blocksize) {
            this.blocksize = blocksize <= 0 ? 65536 : blocksize;
        }

        public int getTtl() {
            return ttl;
        }

        /**
         * 默认是永久保存
         * @param ttl  大于 零的整数,  <= 0 ? tt 为  永久保存
         */
        public void setTtl(int ttl) {
            this.ttl = ttl <= 0 ? HConstants.FOREVER : ttl;
        }

        public byte[][] getSplits() {
            return splits;
        }

        /*
         * 预分区的rowKey范围配置
         * @param splits
         */
        /*
        public void setSplits(byte[][] splits) {
            this.splits = splits;
        }*/
    }

    public static void main(String[] args) throws Exception{
        Admin admin = conf.getHconnection().getAdmin();
        System.out.println(admin);
        //deleteTable("test:user");
        // HBaseTableUtil.createTable("aaaaa","info1",true,-1,1);
        //  HBaseTableUtil.truncateTable("aaaaa");
     /*   boolean b = tableExists("test:user2");
        Table table = getTable("test:user2");
        System.out.println("=================="+table);
        System.out.println("=================="+table.getName());*/

        //HBaseTableUtil.deleteTable("aaaaa");

       /* Table table = HBaseTableUtil.getTable("countform:typecount");
        System.out.println(table);*/
/*
        boolean b = HBaseTableUtil.tableExists("countform:typecount");
        System.out.println(b);*/

        HBaseTableUtil.deleteTable("tanslator");
        HBaseTableUtil.deleteTable("ability");
        HBaseTableUtil.deleteTable("task");
        HBaseTableUtil.deleteTable("paper");

        //  HbaseSearchService hbaseSearchService=new HbaseSearchService();
        //  Map<String, String> stringStringMap = hbaseSearchService.get("countform:bsid","", new BaseMapRowExtrator());
        // Map<String, String> aaaaa = hbaseSearchService.get("countform:bsid", "aaaaa", new BaseMapRowExtrator());
        // System.out.println(aaaaa);
    }
}

com/hsiehchou/hbase/entity/AbstractRow.java

package com.hsiehchou.hbase.entity;

import com.google.common.collect.HashMultimap;
import com.google.common.collect.Sets;

import java.util.Collection;
import java.util.Map;
import java.util.Set;

public abstract class AbstractRow<T extends HBaseCell> {

    protected String rowKey;
    protected HashMultimap<String, T> cells;

    protected Set<String> fields;
    protected long maxCapTime;

    public AbstractRow(String rowKey){
        this.rowKey = rowKey;
        cells = HashMultimap.create();
        fields = Sets.newHashSet();
    }

    public boolean addCell(String field, String value, long capTime){

        return addCell(field, createCell(field, value, capTime));
    }

    public boolean addCell(String field, T cell){

        fields.add(cell.getField());

        if(cell.getCapTime() > maxCapTime)
            maxCapTime = cell.getCapTime();

        return cells.put(field, cell);
    }

    public boolean[] addCell(String field, Collection<T> cells){

        boolean[] status = new boolean[cells.size()];
        int n = 0;
        for(T cell : cells){
            status[n] = addCell(field, cell);
            n++;
        }
        return status;
    }

    public String getRowKey() {
        return rowKey;
    }

    protected abstract T createCell(String field, String value, long capTime);

    public Map<String, Collection<T>> getCell() {
        return cells.asMap();
    }

    public Collection<T> getCellByField(String field){
        return cells.get(field);
    }

    public Set<Map.Entry<String, T>> entries(){
        return  cells.entries();
    }

    @Override
    public String toString() {
        return "AbstractRow [rowKey=" + rowKey + ", cells=" + cells + "]";
    }

    public boolean equals(Object obj) {

       if(this == obj)return true ;
       if(!(obj instanceof AbstractRow))return false ;

       @SuppressWarnings("unchecked")
       AbstractRow<T> row = (AbstractRow<T>) obj;
       if(rowKey.equals(row.getRowKey()))return true;
       return false;
    }

    public int hashCode(){
        return this.rowKey.hashCode();
    }

    public long getMaxCapTime() {
        return maxCapTime;
    }

    public Set<String> getFields() {
        return Sets.newHashSet(fields);
    }
}

com/hsiehchou/hbase/entity/HBaseCell.java

package com.hsiehchou.hbase.entity;

public class HBaseCell implements Comparable<HBaseCell>{

    protected String field;           
    protected String value;
    protected Long capTime;

    public HBaseCell(String field, String value, long capTime){

        this.field = field;
        this.capTime = capTime;
        this.value = value;
    }

    public String getField(){
        return field;
    }

    public String getValue(){
        return value;
    }

    public void setCapTime(long capTime) {
        this.capTime = capTime;
    }

    public Long getCapTime() {
        return capTime;
    }

    public String toString(){
        return String.format("%s_[%s]_%s", field, capTime, value);
    }

    public int compareTo(HBaseCell o) {
        return o.getCapTime().compareTo(this.capTime);
    }

    public boolean equals(Object obj) {

       if(this == obj)return true ;
       if(!(obj instanceof HBaseCell))return false ;

       HBaseCell cell = (HBaseCell)obj;
       if(field.equals(cell.getField()) && value.equals(cell.getValue())){
           if(cell.getCapTime() < capTime){
               cell.setCapTime(this.capTime);
           }
           return true;
       }
       return false;
    }

    public int hashCode(){
        return this.field.hashCode() +  31*this.value.hashCode();
    }

}

com/hsiehchou/hbase/entity/HBaseRow.java

package com.hsiehchou.hbase.entity;

public class HBaseRow extends AbstractRow<HBaseCell> {

    public HBaseRow(String rowKey){
        super(rowKey);
    }

    public boolean[] addCell(String field, HBaseCell ... cells){

        boolean[] status = new boolean[cells.length];
        for(int i = 0; i < cells.length; i++){
            status[i] = addCell(field, cells[i]);
        }
        return status;
    }

    protected HBaseCell createCell(String field, String value, long capTime) {
        return new HBaseCell(field, value, capTime);
    }
}

com/hsiehchou/hbase/extractor/BaseListRowExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class BaseListRowExtrator implements RowExtractor<List<String>>{

    private List<String> row;

    public Long lastcjtime = 0l;

    public Long firstcjtime = 0l;

    @Override
    public List<String> extractRowData(Result result, int rowNum)
            throws IOException {

        row = new ArrayList<String>();
        for(Cell cell :  result.listCells()) {
            String column = Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength());
            String value = Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
            if(column.equalsIgnoreCase("cjtime")) {
                Long v = Long.parseLong(value);
                if(lastcjtime < v) {
                    lastcjtime = v;
                }else if(firstcjtime > v) {
                    firstcjtime = v;
                }
            }
            row.add(value);
        }
        return row;
    }
}

com/hsiehchou/hbase/extractor/BaseMapRowExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class BaseMapRowExtrator implements RowExtractor<Map<String,String>> {

    private Map<String,String> row;

    private List<byte[]> rows;
    private String longTimeField;
    private SimpleDateFormat format;

    private String field;
    private String value;

    private long time;

    public BaseMapRowExtrator(){}

    /**
     * @param rows   需要提取 所有的 rowKey  , null 则不提取
     */
    public BaseMapRowExtrator(List<byte[]> rows){
        this.rows = rows;
    }

    /**
     * @param rows             需要提取 所有的 rowKey  , null 则不提取
     * @param longTimeField    long类型的时间字段   表示需要将其转换称 String 类型
     */
    public BaseMapRowExtrator(List<byte[]> rows,String longTimeField){
        this.rows = rows;
        this.longTimeField = longTimeField;
    }

    /**
     * @param rows                  需要提取 所有的 rowKey  , null 则不提取
     * @param longTimeField         long类型的时间字段
     * @param timePattern           表示需要已该指定的格式  将时间字段的值转换成字符串
     */
    public BaseMapRowExtrator(List<byte[]> rows,String longTimeField,String timePattern){
        this.rows = rows;
        this.longTimeField = longTimeField;
        if(StringUtils.isNotBlank(timePattern)){
            format = new SimpleDateFormat(timePattern);
        }
    }

    public Map<String, String> extractRowData(Result result, int rowNum) throws IOException {

            row = new HashMap<String,String>();

            List<Cell> cells = result.listCells();
            for(Cell cell :  cells) {
                field = Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength());
                if( field.equals(longTimeField)  ){
                    time = Bytes.toLong(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
                    if( format != null ){
                        value = format.format(new Date(time));
                    }else{
                        value = String.valueOf(time);
                    }
                }else{
                    value = Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
                }
                row.put(field,value);
            }

            if( rows != null ){
                rows.add(result.getRow());
            }
        return row;
    }
}

com/hsiehchou/hbase/extractor/BaseMapWithRowKeyExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

public class BaseMapWithRowKeyExtrator implements RowExtractor<Map<String,String>> {

    private Map<String,String> row;

    /* (non-Javadoc)
     * @see com.bh.d406.bigdata.hbase.extractor.RowExtractor#extractRowData(org.apache.hadoop.hbase.client.Result, int)
     */
    @Override
    public Map<String, String> extractRowData(Result result, int rowNum)
            throws IOException {

        row = new HashMap<String,String>();
        row.put("rowKey", Bytes.toString( result.getRow() ));

        for(Cell cell :  result.listCells()) {
            row.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()),Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
        }
        return row;
    }
}

com/hsiehchou/hbase/extractor/BeanRowExtrator.java

package com.hsiehchou.hbase.extractor;

import com.google.common.collect.Maps;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.lang.reflect.Field;
import java.util.Map;

public class BeanRowExtrator<T> implements RowExtractor<T> {

    private static final Logger LOG = LoggerFactory.getLogger(BeanRowExtrator.class);

    private Class<T> clazz;
    private Map<String,Field> fieldMap;

    public BeanRowExtrator(Class<T> clazz){
        this.clazz = clazz;
        this.fieldMap = getDeclaredFields(clazz);
    }

    public T extractRowData(Result result, int rowNum) throws IOException {
        return resultReflectToClass(result, rowNum);
    }

    private T resultReflectToClass(Result result, int rowNum){
        String column = null;
        Field field = null;
        T obj = null;
        try {
            obj = clazz.newInstance();
            for(Cell cell : result.listCells()){
                column = Bytes.toString(cell.getQualifierArray(),
                        cell.getQualifierOffset(), cell.getQualifierLength());
                /*检查该列是否在实体类中存在对应的属性,若存在则 为其赋值*/
                if((field = fieldMap.get(column.toLowerCase())) != null){
                    field.set(obj, Bytes.toString(cell.getValueArray(),
                            cell.getValueOffset(), cell.getValueLength()));
                }
            }
        } catch (InstantiationException e) {
            LOG.error(String.format("解析第%个满足条件的记录%s失败。", rowNum, result), e);
        } catch (IllegalAccessException e) {
            LOG.error(String.format("解析第%s个满足条件的记录%s失败。", rowNum, result), e);
        }
        return obj;
    }

    private  Map<String,Field>  getDeclaredFields(Class<?> clazz){
        Field[] fields = clazz.getDeclaredFields();
        Field field = null;
        Map<String,Field> fieldMap = Maps.newHashMapWithExpectedSize(fields.length);

        for(int i = 0; i < fields.length; i++){
            field = fields[i];
            if(field.getModifiers() == 2){
                field.setAccessible(true);
                fieldMap.put(field.getName().toLowerCase(), field);
            }
        }
        fields = null;

        return fieldMap;
    }
}

com/hsiehchou/hbase/extractor/CellNumExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.client.Result;

import java.io.IOException;

public class CellNumExtrator implements RowExtractor<Integer> {

    public Integer extractRowData(Result result, int rowNum) throws IOException {
        return  result.listCells().size();
    }
}

com/hsiehchou/hbase/extractor/MapLongRowExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

public class MapLongRowExtrator implements RowExtractor<Map<String,Long>> {

    private Map<String,Long> row;

    @Override
    public Map<String, Long> extractRowData(Result result, int rowNum) throws IOException {

        row = new HashMap<String,Long>();

        for(Cell cell :  result.listCells()) {
            row.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()),Bytes.toLong(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
        }
        return row;
    }
}

com/hsiehchou/hbase/extractor/MapRowExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.io.Serializable;
import java.util.HashMap;
import java.util.Map;

public class MapRowExtrator implements RowExtractor<Map<String,String>>,Serializable {

    private static final long serialVersionUID = 1543027485077396235L;

    private Map<String,String> row;

    /* (non-Javadoc)
     * @see com.bh.d406.bigdata.hbase.extractor.RowExtractor#extractRowData(org.apache.hadoop.hbase.client.Result, int)
     */
    @Override
    public Map<String, String> extractRowData(Result result, int rowNum) throws IOException {

        row = new HashMap<String,String>();

        for(Cell cell :  result.listCells()) {
            row.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()),Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
        }
        return row;
    }
}

com/hsiehchou/hbase/extractor/MultiVersionRowExtrator.java

package com.hsiehchou.hbase.extractor;

import com.hsiehchou.hbase.entity.HBaseRow;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class MultiVersionRowExtrator implements RowExtractor<HBaseRow>{

    private HBaseRow row;

    public HBaseRow extractRowData(Result result, int rowNum) throws IOException {

        row = new HBaseRow(Bytes.toString(result.getRow()));

        String field = null;
        String value = null;
        long capTime = 0L;
        for(Cell cell : result.listCells()){
            field = Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength());
            value = Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
            capTime = cell.getTimestamp();

            row.addCell(field, value, capTime);
        }
        return  row ;
    }
}

com/hsiehchou/hbase/extractor/OneColumnRowByteExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.client.Result;

import java.io.IOException;
import java.io.Serializable;

public class OneColumnRowByteExtrator implements RowExtractor<byte[]> ,Serializable{

    private static final long serialVersionUID = -3420092335124240222L;

    private byte[] cf;
    private byte[] cl;

    public OneColumnRowByteExtrator( byte[] cf,byte[] cl ){
        this.cf = cf;
        this.cl = cl;
    }

    public byte[] extractRowData(Result result, int rowNum) throws IOException {
        return result.getValue(cf, cl);
    }
}

com/hsiehchou/hbase/extractor/OneColumnRowStringExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.io.Serializable;

public class OneColumnRowStringExtrator implements RowExtractor<String>  , Serializable{

    private static final long serialVersionUID = -8585637277902568648L;

    private byte[] cf ;
    private byte[] cl ;

    public OneColumnRowStringExtrator( byte[] cf , byte[] cl ){
        this.cf = cf;
        this.cl = cl;
    }

    /* (non-Javadoc)
     * @see com.bh.d406.bigdata.hbase.extractor.RowExtractor#extractRowData(org.apache.hadoop.hbase.client.Result, int)
     */
    @Override
    public String extractRowData(Result result, int rowNum) throws IOException {

        byte[] value = result.getValue(cf, cl);
        if( value == null ) return null;

        return  Bytes.toString( value ) ;
    }
}

com/hsiehchou/hbase/extractor/OnlyRowKeyExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.client.Result;

import java.io.IOException;

public class OnlyRowKeyExtrator implements RowExtractor<byte[]> {

    @Override
    public byte[] extractRowData(Result result, int rowNum) throws IOException {
        // TODO Auto-generated method stub
        return result.getRow();
    }
}

com/hsiehchou/hbase/extractor/OnlyRowKeyStringExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class OnlyRowKeyStringExtrator implements RowExtractor<String> {

    public String extractRowData(Result result, int rowNum) throws IOException {
        return Bytes.toString( result.getRow() );
    }
}

com/hsiehchou/hbase/extractor/RowExtractor.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.client.Result;

import java.io.IOException;

public interface RowExtractor<T>  {

    /**
      * description:
      * @param result  result解析器
      * @param rowNum  
      * @return
      * @throws Exception
      * T
     */
    T extractRowData(Result result, int rowNum) throws IOException;
}

com/hsiehchou/hbase/extractor/SingleColumnMultiVersionRowExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.Set;

public class SingleColumnMultiVersionRowExtrator implements RowExtractor<Set<String>>{

    private Set<String> values;
    private byte[] cf;
    private byte[] cl;


    /**
     * 单列解析器  获取hbase 单列多版本数据
     * @param cf     列簇
     * @param cl     列
     * @param values 返回值
     */
    public SingleColumnMultiVersionRowExtrator(byte[] cf, byte[] cl, Set<String> values){
        this.cf = cf;
        this.cl = cl;
        this.values = values;
    }

    public Set<String> extractRowData(Result result, int rowNum) throws IOException {

        for(Cell cell : result.getColumnCells(cf, cl)){
            values.add(Bytes.toString(cell.getValueArray(),cell.getValueOffset(), cell.getValueLength()));
        }
        return values;
    }

}

com/hsiehchou/hbase/extractor/StrToByteExtrator.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.io.Serializable;
import java.util.HashMap;
import java.util.Map;

public class StrToByteExtrator implements RowExtractor<Map<String,byte[]>> ,Serializable {

    private static final long serialVersionUID = 4633698173362569711L;

    private Map<String,byte[]> row;

    @Override
    public Map<String, byte[]> extractRowData(Result result, int rowNum) throws IOException {

        row = new HashMap<String,byte[]>();

        for(Cell cell :  result.listCells()) {
            row.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()),
                    Bytes.copy(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
        }
        return row;
    }
}

com/hsiehchou/hbase/extractor/ToRowList.java

package com.hsiehchou.hbase.extractor;

import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;

/**
 * Hbase数据库中数据提取接口实现:
 * 提取result的rowKey,和每个cell的值作为一行数据,
 * 一个cell=(row, family:qualifier:value, version)
 *
 * <p>
 * 每行数据的格式为:{rowKey column${separator}value column${separator}value ...}
 * 其中,不同的列之间用空格分隔,同样列元素的描述符与值之间用${separator}分隔
 */
public class ToRowList implements RowExtractor<List<String>> {

    private Boolean currentVersion; //currentVersion为true:只取当前最新版本,false:取所有版本
    private char separator; //不同元素之间拼接时的分隔符,默认为`#`

    private ToRowList(Boolean currentVersion, char separator) {
        this.separator = separator;
        this.currentVersion = currentVersion;
    }

    public ToRowList(Boolean currentVersion) {
        this(currentVersion, '#');
    }

    public ToRowList() {
        this(true, '#');
    }

    /**
      * 对{当前版本}存放在list[0] = {rowKey` `column`#`value` `column`#`value ...}
      * 多版本的时候list({rowKey`#`version1` `column`#`value` `column`#`value ...},
      * {rowKey`#`version2` `column`#`value` `column`#`value ...})
      */
    @Override
    public List<String> extractRowData(Result result, int rowNum) throws IOException {
        if(result == null || result.isEmpty()) return null;

        final char SPACE = ' ';

        List<String> rows = new LinkedList<>();

        //一个result是同一个rowKey的所有cells集合
        String rowKey = Bytes.toString(result.getRow());

        //build rowKey` `column`#`value` `column`#`value ...
        StringBuilder row = new StringBuilder();
        row.append(rowKey).append(SPACE);

        //用于处理不同版本的映射
        Map<Long, String> version2qualifiersAndValues = new HashMap<>();

        List<Cell> cells = result.listCells();
        for (Cell cell : cells) {
            String value = Bytes.toString(cell.getValueArray(),
                    cell.getValueOffset(), cell.getValueLength());
            String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));

            if (currentVersion) {
                row.append(qualifier).append(separator).append(value).append(SPACE);
            } else {
                Long version = cell.getTimestamp();
                String tmp = version2qualifiersAndValues.get(version);
                version2qualifiersAndValues.put(version,
                        StringUtils.isNotBlank(tmp) ? tmp + " " + qualifier + separator + value
                                : rowKey + separator + version + " " + qualifier + separator + value);
            }
        }

        if (currentVersion) {
            rows.add(row.toString());
        } else {
            for (String v : version2qualifiersAndValues.values()) {
                rows.add(v);
            }
        }

        return rows;
    }
}

com/hsiehchou/hbase/extractor/ToRowMap.java

package com.hsiehchou.hbase.extractor;

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

/**
 * currentVersion 标识是否取多版本的数据,默认取当前版本
 * 对当前版本,返回row`#`qualifier->value的映射
 * 对多个版本,返回row`#`version`#`qualifier->value的映射
 */
public class ToRowMap implements RowExtractor<Map<String, String>> {

    private Boolean currentVersion;

    public ToRowMap() {
        this(true);
    }

    private ToRowMap(Boolean currentVersion) {
        this.currentVersion = currentVersion;
    }

    @Override
    public Map<String, String> extractRowData(Result result, int rowNum)
            throws IOException {
        if(result == null || result.isEmpty()) return null;

        final char HashTag = '#';

        HashMap<String, String> col2value = new HashMap<>();

        String rowKey = Bytes.toString(result.getRow());

        for (Cell cell : result.listCells()) {
            String value = Bytes.toString(cell.getValueArray(),
                    cell.getValueOffset(), cell.getValueLength());
            String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
            if (currentVersion)
                col2value.put(rowKey + HashTag + qualifier, value);
            else {
                long version = cell.getTimestamp();
                col2value.put(rowKey + HashTag + version + HashTag + qualifier, value);
            }
        }

        return col2value;
    }
}

com/hsiehchou/hbase/insert/HBaseInsertException.java

package com.hsiehchou.hbase.insert;

import java.util.Iterator;

public class HBaseInsertException extends Exception{
    public HBaseInsertException(String message) {
        super(message);
    }

    public final synchronized void addSuppresseds(Iterable<Exception> exceptions){

        if(exceptions != null){
            Iterator<Exception> iterator = exceptions.iterator();
            while (iterator.hasNext()){
                addSuppressed(iterator.next());
            }
        }
    }
}

com/hsiehchou/hbase/insert/HBaseInsertHelper.java

package com.hsiehchou.hbase.insert;

import com.hsiehchou.hbase.config.HBaseTableUtil;
import com.google.common.collect.Lists;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;

import java.io.Serializable;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

/**
 * 添加HBASE 插入数据类
 */
public class HBaseInsertHelper implements Serializable{

    private HBaseInsertHelper(){}

    public static void put(String tableName, Put put) throws Exception {
        put(tableName, Lists.newArrayList(put));
    }

    public static void put(String tableName, List<Put> puts) throws Exception {
        if(!puts.isEmpty()){
            Table table = HBaseTableUtil.getTable(tableName);
            try {
                table.put(puts);
            }catch (Exception e){
                e.printStackTrace();
            }finally {
                HBaseTableUtil.close(table);
            }
        }
     }

    public static void put(final String tableName, List<Put> puts, int perThreadPutSize) throws Exception {

        int size = puts.size();
        if(size > perThreadPutSize){

            int threadNum = (int)Math.ceil(size / (double)perThreadPutSize);
            ExecutorService executorService = Executors.newFixedThreadPool(threadNum);

            final CountDownLatch  cdl = new CountDownLatch(threadNum);
            final List<Exception>  es = Collections.synchronizedList(new ArrayList<Exception>());

            try {
                for(int i = 0; i < threadNum; i++){
                    final List<Put> tmp;
                    if(i == (threadNum - 1)){
                        tmp = puts.subList(perThreadPutSize*i, size);
                    }else{
                        tmp = puts.subList(perThreadPutSize*i, perThreadPutSize*(i + 1));
                    }
                    executorService.execute(new Runnable() {
                        public void run() {
                            try {
                                if(es.isEmpty()) put(tableName, tmp);
                            } catch (Exception e) {
                                es.add(e);
                            }finally {
                                cdl.countDown();
                            }
                        }
                    });
                }
                cdl.await();
            }finally {
                executorService.shutdown();
            }
            if(es.size() > 0){
                HBaseInsertException insertException = new HBaseInsertException(String.format("put数据到表%s失败。"));
                insertException.addSuppresseds(es);
                throw insertException;
            }
        }else {
            put(tableName, puts);
        }
    }

    public static void checkAndPut(String tableName, byte[] row, byte[] family, byte[] qualifier,
                                   byte[] value, Put put) throws Exception {
        checkAndPut(tableName, row, family, qualifier, null, value, put);
    }

    public static void checkAndPut(String tableName, byte[] row, byte[] family, byte[] qualifier,
                                   CompareOp compareOp, byte[] value, Put put) throws Exception {

        if(!put.isEmpty() ){
            Table table = HBaseTableUtil.getTable(tableName);
            try {
                if(compareOp == null){
                    table.checkAndPut(row, family, qualifier, value, put);
                }else{
                    table.checkAndPut(row, family, qualifier, compareOp, value, put);
                }
            }finally{
                HBaseTableUtil.close(table);
            }
        }
    }
}

com/hsiehchou/hbase/search/HBaseSearchService.java

package com.hsiehchou.hbase.search;

import com.hsiehchou.hbase.extractor.RowExtractor;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Scan;

import java.io.IOException;
import java.util.List;
import java.util.Map;


public interface HBaseSearchService {



    /**
      *  根据  用户 给定的解析类  解析  查询结果
      * @param tableName
      * @param scan
      * @param extractor  用户自定义的 结果解析 类
      * @return
      * @throws IOException
      * List<T>
     */
    <T> List<T> search(String tableName, Scan scan, RowExtractor<T> extractor) throws IOException;

    /**
      * 当存在多个  scan时  采用多线程查询
      * @param tableName
      * @param scans
      * @param extractor  用户自定义的 结果解析 类
      * @return
      * @throws IOException
      * List<T>
     */
    <T> List<T> searchMore(String tableName, List<Scan> scans, RowExtractor<T> extractor) throws IOException;

    /**
      * 采用多线程  同时查询多个表
      * @param more
      * @return
      * @throws IOException
      * List<T>
     */
    <T> Map<String,List<T>> searchMore(List<SearchMoreTable<T>> more) throws IOException;

    /**
      * 利用反射  自动封装实体类
      * @param tableName
      * @param scan    
      * @param cls   HBase表对应的实体类,属性只包含对应表的 列 , 不区分大小写
      * @return
      * @throws IOException
      * @throws InstantiationException
      * @throws IllegalAccessException
      * List<T>
     */
    <T> List<T> search(String tableName, Scan scan, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException;

    /**
      * 当存在多个 scan 时  采用多线程查询
      * @param tableName
      * @param scans
      * @param cls   HBase表对应的实体类,属性只包含对应表的 列 , 不区分大小写
      * @return
      * @throws IOException
      * @throws InstantiationException
      * @throws IllegalAccessException
      * List<T>
     */
    <T> List<T> searchMore(String tableName, List<Scan> scans, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException;


    /**
      * 批量 get 查询  并按自定义的方式解析结果集
      * @param tableName
      * @param gets
      * @param extractor  用户自定义的 结果解析 类
      * @return
      * @throws IOException
      * List<T>
     */
    <T> List<T> search(String tableName, List<Get> gets, RowExtractor<T> extractor) throws IOException;

    /**
      * 多线程批量get, 并按自定义的方式解析结果集
      * 建议 : perThreadExtractorGetNum >= 100
      * @param tableName
      * @param gets
      * @param perThreadExtractorGetNum    每个线程处理的 get的个数 
      * @param extractor  用户自定义的 结果解析 类
      * @return
      * @throws IOException
      * List<T>
     */
    <T> List<T> searchMore(String tableName, List<Get> gets, int perThreadExtractorGetNum, RowExtractor<T> extractor) throws IOException;

    /**
      * 批量 get 查询  并利用反射 封装到指定的实体类中
      * @param tableName
      * @param gets
      * @param  cls   HBase表对应的实体类,属性只包含对应表的 列 , 不区分大小写
      * @return      
      * @throws IOException
      * @throws InstantiationException
      * List<T>
     */
    <T> List<T> search(String tableName, List<Get> gets, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException;

    /**
      * 多线程批量 get 查询  并利用反射 封装到指定的实体类中
      * 建议 : perThreadExtractorGetNum >= 100
      * @param tableName
      * @param gets
      * @param perThreadExtractorGetNum  每个线程处理的 get的个数 
      * @param  cls   HBase表对应的实体类,属性只包含对应表的 列 , 不区分大小写
      * @return
      * @throws IOException
      * @throws InstantiationException
      * @throws IllegalAccessException
      * List<T>
     */
    <T> List<T> searchMore(String tableName, List<Get> gets, int perThreadExtractorGetNum, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException;

    /**
      * get 查询  并按自定义的方式解析结果集
      * @param tableName
      * @param extractor   用户自定义的 结果解析 类
      * @return     如果 查询不到  则 返回  null
      * @throws IOException
      * List<T>
     */
    <T> T search(String tableName, Get get, RowExtractor<T> extractor) throws IOException;

    /**
      * get 查询  并利用反射 封装到指定的实体类中
      * @param tableName
      * @param  cls   HBase表对应的实体类,属性只包含对应表的 列 , 不区分大小写
      * @return     如果 查询不到  则 返回  null
      * @throws IOException
      * @throws InstantiationException
      * List<T>
     */
    <T> T search(String tableName, Get get, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException;

}

com/hsiehchou/hbase/search/HBaseSearchServiceImpl.java

package com.hsiehchou.hbase.search;

import com.hsiehchou.hbase.config.HBaseTableFactory;
import com.hsiehchou.hbase.extractor.RowExtractor;
import org.apache.hadoop.hbase.client.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Map;


public class HBaseSearchServiceImpl implements HBaseSearchService,Serializable{

    private static final long serialVersionUID = -8657479861137115645L;

    private static final Logger LOG = LoggerFactory.getLogger(HBaseSearchServiceImpl.class);

    private HBaseTableFactory factory = new HBaseTableFactory();
    private int poolCapacity = 6;


    @Override
    public <T> List<T> search(String tableName, Scan scan, RowExtractor<T> extractor) throws IOException {
        return null;
    }

    @Override
    public <T> List<T> searchMore(String tableName, List<Scan> scans, RowExtractor<T> extractor) throws IOException {
        return null;
    }

    @Override
    public <T> Map<String, List<T>> searchMore(List<SearchMoreTable<T>> more) throws IOException {
        return null;
    }

    @Override
    public <T> List<T> search(String tableName, Scan scan, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException {
        return null;
    }

    @Override
    public <T> List<T> searchMore(String tableName, List<Scan> scans, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException {
        return null;
    }

    @Override
    public <T> List<T> search(String tableName, List<Get> gets, RowExtractor<T> extractor) throws IOException {
        List<T> data = new ArrayList<T>();
        search(tableName, gets, extractor,data);
        return data;
    }

    @Override
    public <T> List<T> searchMore(String tableName, List<Get> gets, int perThreadExtractorGetNum, RowExtractor<T> extractor) throws IOException {
        return null;
    }

    @Override
    public <T> List<T> search(String tableName, List<Get> gets, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException {
        return null;
    }

    @Override
    public <T> List<T> searchMore(String tableName, List<Get> gets, int perThreadExtractorGetNum, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException {
        return null;
    }

    @Override
    public <T> T search(String tableName, Get get, RowExtractor<T> extractor) throws IOException {

        T obj = null;
        List<T> res = search(tableName,Arrays.asList(get),extractor);
        if( !res.isEmpty()){
            obj = res.get(0);
        }

        return obj;
    }

    @Override
    public <T> T search(String tableName, Get get, Class<T> cls) throws IOException, InstantiationException, IllegalAccessException {
        return null;
    }

    private <T> void search(String tableName, List<Get> gets,
                            RowExtractor<T> extractor , List<T> data ) throws IOException {

        //根据table名获取表连接
        Table table = factory.getHBaseTableInstance(tableName);
        if(table != null ){
            Result[] results = table.get(gets);
            int n = 0;
            T row = null;
            for( Result result : results){
                if( !result.isEmpty() ){
                    row = extractor.extractRowData(result, n);
                    if(row != null )data.add(row);
                    n++;
                }
            }
            close( table, null);
        }else{
            throw new IOException(" table  " + tableName + " is not exists ..");
        }
    }

    public static boolean  existsRowkey( Table table, String rowkey){
        boolean exists =true;
        try {
            exists = table.exists(new Get(rowkey.getBytes()));
        } catch (IOException e) {
            LOG.error("失败。", e );
        }
        return exists;
    }

    public static void  close( Table table, ResultScanner scanner ){

        try {
            if( table != null ){
                table.close();
                table = null;
            }
            if( scanner != null ){
                scanner.close();
                scanner = null;
            }
        } catch (IOException e) {
            LOG.error("关闭 HBase的表  " + table.getName().toString() + " 失败。", e );
        }

    }
}

com/hsiehchou/hbase/search/SearchMoreTable.java

package com.hsiehchou.hbase.search;

import com.hsiehchou.hbase.extractor.RowExtractor;
import org.apache.hadoop.hbase.client.Scan;

public class SearchMoreTable<T> {

    private String tableName;
    private Scan scan;
    private RowExtractor<T> extractor;

    public SearchMoreTable() {
        super();
    }

    public SearchMoreTable(String tableName, Scan scan,
            RowExtractor<T> extractor) {
        super();
        this.tableName = tableName;
        this.scan = scan;
        this.extractor = extractor;
    }

    public String getTableName() {
        return tableName;
    }
    public void setTableName(String tableName) {
        this.tableName = tableName;
    }
    public Scan getScan() {
        return scan;
    }
    public void setScan(Scan scan) {
        this.scan = scan;
    }
    public RowExtractor<T> getExtractor() {
        return extractor;
    }
    public void setExtractor(RowExtractor<T> extractor) {
        this.extractor = extractor;
    }
}

com/hsiehchou/hbase/spilt/SpiltRegionUtil.java

package com.hsiehchou.hbase.spilt;

import org.apache.hadoop.hbase.util.Bytes;

import java.util.Iterator;
import java.util.TreeSet;

/**
 * hbase 预分区
 */
public class SpiltRegionUtil {

    /**
     * 定义分区
     * @return
     */
    public static byte[][] getSplitKeysBydinct() {

        String[] keys = new String[]{"1","2", "3","4", "5","6", "7","8", "9","a","b", "c","d","e","f"};
        //String[] keys = new String[]{"10|", "20|", "30|", "40|", "50|", "60|", "70|", "80|", "90|"};
        byte[][] splitKeys = new byte[keys.length][];

        //通过treeset排序
        TreeSet<byte[]> rows = new TreeSet<byte[]>(Bytes.BYTES_COMPARATOR);//升序排序
        for (int i = 0; i < keys.length; i++) {
            rows.add(Bytes.toBytes(keys[i]));
        }
        Iterator<byte[]> rowKeyIter = rows.iterator();
        int i = 0;
        while (rowKeyIter.hasNext()) {
            byte[] tempRow = rowKeyIter.next();
            rowKeyIter.remove();
            splitKeys[i] = tempRow;
            i++;
        }
        return splitKeys;
    }
}

6、执行

spark-submit --master local[1] --num-executors 1 --driver-memory 300m --executor-memory 500m --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ‘ ‘ ‘,’) --class com.hsiehchou.spark.streaming.kafka.kafka2hbase.DataRelationStreaming /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar

7、执行截图

hbase_list

hbase_scan

hbase写入数据

十二、SpringCloud 项目构建

SpringCloud微服务

服务注册

解决IntelliJ IDEA 创建Maven项目速度慢问题
add Maven Property
Name:archetypeCatalog
Value:internal

1、构建SpringCloud父项目

在原项目下新建 xz_bigdata_springcloud_dir目录

新建 xz_bigdata_springcloud_dir目录

2、在此目录下新建 xz_bigdata_springclod_root项目

新建 xz_bigdata_springcloud_root项目

3、 引入SpringCloud依赖

父pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <modules>
    <module>xz_bigdata_springcloud_common</module>
    <module>xz_bigdata_springcloud_esquery</module>
    <module>xz_bigdata_springcloud_eureka</module>
    <module>xz_bigdata_springcloud_hbasequery</module>
  </modules>

  <parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.0.9.RELEASE</version>
  </parent>

  <groupId>com.hsiehchou.springcloud</groupId>
  <artifactId>xz_bigdata_springcloud_root</artifactId>
  <version>1.0-SNAPSHOT</version>
  <packaging>pom</packaging>

  <name>xz_bigdata_springcloud_root</name>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
  </properties>

  <!--CDH源-->
  <repositories>
    <repository>
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
  </repositories>
  <!--依赖管理,用于管理spring-cloud的依赖-->
  <dependencyManagement>
    <dependencies>
      <!--spring-cloud-dependencies-->
      <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-dependencies</artifactId>
        <version>Finchley.SR1</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>
    </dependencies>
  </dependencyManagement>
  <!--打包插件-->
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.1</version>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
          <encoding>UTF-8</encoding>
        </configuration>
      </plugin>
    </plugins>
  </build>
</project>

删除父项目src目录。因为这个项目主要是管理子项目不做任何逻辑业务

4、构建SpringCloud Common子项目

新建子模块
xz_bigdata_springcloud_common

引入依赖

pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata_springcloud_root</artifactId>
        <groupId>com.hsiehchou.springcloud</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_springcloud_common</artifactId>

    <name>xz_bigdata_springcloud_common</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>

    <dependencies>
        <!--eureka-server-->
        <!-- https://mvnrepository.com/artifact/org.springframework.cloud/spring-cloud-starter-eureka-server -->
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-starter-netflix-eureka-server</artifactId>
            <exclusions>
                <exclusion>
                    <artifactId>HdrHistogram</artifactId>
                    <groupId>org.hdrhistogram</groupId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.24</version>
        </dependency>
    </dependencies>
</project>

5、构建Eureka服务注册中心

新建xz_bigdata_springcloud_eureka子模块

新建xz_bigdata_springcloud_eureka子模块

引入依赖

pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata_springcloud_root</artifactId>
        <groupId>com.hsiehchou.springcloud</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_springcloud_eureka</artifactId>

    <name>xz_bigdata_springcloud_eureka</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>

    <dependencies>
        <dependency>
            <groupId>com.hsiehchou.springcloud</groupId>
            <artifactId>xz_bigdata_springcloud_common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <!--用户验证-->
  <!--      <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-security</artifactId>
            <version>1.4.1.RELEASE</version>
        </dependency>-->
    </dependencies>


    <build>
        <plugins>
            <plugin><!--打包依赖的jar包-->
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <configuration>
                    <outputDirectory>${project.build.directory}/lib</outputDirectory>
                    <excludeTransitive>false</excludeTransitive> <!-- 表示是否不包含间接依赖的包 -->
                    <stripVersion>false</stripVersion> <!-- 去除版本信息 -->
                </configuration>

                <executions>
                    <execution>
                        <id>copy-dependencies</id>
                        <phase>package</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                        <configuration>
                            <!-- 拷贝项目依赖包到lib/目录下 -->
                            <outputDirectory>${project.build.directory}/jars</outputDirectory>
                            <excludeTransitive>false</excludeTransitive>
                            <stripVersion>false</stripVersion>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <!-- 打成jar包插件 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <version>2.5</version>
                <configuration>
                    <archive>
                        <!--
                        生成的jar中,不要包含pom.xml和pom.properties这两个文件
                    -->
                        <addMavenDescriptor>false</addMavenDescriptor>
                        <!-- 生成MANIFEST.MF的设置 -->
                        <manifest>
                            <!-- 为依赖包添加路径, 这些路径会写在MANIFEST文件的Class-Path下 -->
                            <addClasspath>true</addClasspath>
                            <classpathPrefix>jars/</classpathPrefix>
                            <!-- jar启动入口类-->
                            <mainClass>com.cn.hbase.mr.HbaseMr</mainClass>
                        </manifest>
                        <!--       <manifestEntries>
                                   &lt;!&ndash; 在Class-Path下添加配置文件的路径 &ndash;&gt;
                                   <Class-Path></Class-Path>
                               </manifestEntries>-->
                    </archive>
                    <outputDirectory>${project.build.directory}/</outputDirectory>
                    <includes>
                        <!-- 打jar包时,只打包class文件 -->
                        <include>**/*.class</include>
                        <include>**/*.properties</include>
                        <include>**/*.yml</include>
                    </includes>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

新建resources配置文件目录,添加application.yml配置文件或者 application.properties

application.yml

server:
  port: 8761
eureka:
  client:
    register-with-eureka: false
    fetch-registry: false
    service-url:
      defaultZone: http://root:root@hadoop3:8761/eureka/

xz_bigdata_springcloud_eureka结构

新建EurekaApplication 启动类

EurekaApplication.java

package com.hsiehchou.springcloud.eureka;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.netflix.eureka.server.EnableEurekaServer;

/**
 * 注册中心
 */
@SpringBootApplication
@EnableEurekaServer
public class EurekaApplication
{
    public static void main( String[] args )
    {
        SpringApplication.run(EurekaApplication.class, args);
    }
}

执行EurekaApplication 启动

访问localhost:8761

访问hadoop3:8761

6、构建HBase查询服务模块

新建xz_bigdata_springcloud_root子模块

新建xz_bigdata_springcloud_root子模块

添加依赖

pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>xz_bigdata_springcloud_root</artifactId>
        <groupId>com.hsiehchou.springcloud</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>xz_bigdata_springcloud_hbasequery</artifactId>

    <name>xz_bigdata_springcloud_hbasequery</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>

    <dependencies>
        <!--spring common依赖-->
        <dependency>
            <groupId>com.hsiehchou.springcloud</groupId>
            <artifactId>xz_bigdata_springcloud_common</artifactId>
            <version>1.0-SNAPSHOT</version>
            <exclusions>
                <exclusion>
                    <artifactId>HdrHistogram</artifactId>
                    <groupId>org.hdrhistogram</groupId>
                </exclusion>
            </exclusions>
        </dependency>
        <!--基础服务hbase依赖-->
        <dependency>
            <groupId>com.hsiehchou</groupId>
            <artifactId>xz_bigdata_hbase</artifactId>
            <version>1.0-SNAPSHOT</version>
            <exclusions>
                <exclusion>
                    <artifactId>fastjson</artifactId>
                    <groupId>com.alibaba</groupId>
                </exclusion>
            </exclusions>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin><!--打包依赖的jar包-->
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <configuration>
                    <outputDirectory>${project.build.directory}/lib</outputDirectory>
                    <excludeTransitive>false</excludeTransitive> <!-- 表示是否不包含间接依赖的包 -->
                    <stripVersion>false</stripVersion> <!-- 去除版本信息 -->
                </configuration>

                <executions>
                    <execution>
                        <id>copy-dependencies</id>
                        <phase>package</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                        <configuration>
                            <!-- 拷贝项目依赖包到lib/目录下 -->
                            <outputDirectory>${project.build.directory}/jars</outputDirectory>
                            <excludeTransitive>false</excludeTransitive>
                            <stripVersion>false</stripVersion>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <!-- 打成jar包插件 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <version>2.5</version>
                <configuration>
                    <archive>
                        <!--
                        生成的jar中,不要包含pom.xml和pom.properties这两个文件
                    -->
                        <addMavenDescriptor>false</addMavenDescriptor>
                        <!-- 生成MANIFEST.MF的设置 -->
                        <manifest>
                            <!-- 为依赖包添加路径, 这些路径会写在MANIFEST文件的Class-Path下 -->
                            <addClasspath>true</addClasspath>
                            <classpathPrefix>jars/</classpathPrefix>
                            <!-- jar启动入口类-->
                            <mainClass>com.cn.hbase.mr.HbaseMr</mainClass>
                        </manifest>
                        <!--       <manifestEntries>
                                   &lt;!&ndash; 在Class-Path下添加配置文件的路径 &ndash;&gt;
                                   <Class-Path></Class-Path>
                               </manifestEntries>-->
                    </archive>
                    <outputDirectory>${project.build.directory}/</outputDirectory>
                    <includes>
                        <!-- 打jar包时,只打包class文件 -->
                        <include>**/*.class</include>
                        <include>**/*.properties</include>
                        <include>**/*.yml</include>
                    </includes>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

添加配置文件

新建 resources 目录
添加 application.properties 文件

server.port=8002

logging.level.root=INFO
logging.level.org.hibernate=INFO
logging.level.org.hibernate.type.descriptor.sql.BasicBinder=TRACE
logging.level.org.hibernate.type.descriptor.sql.BasicExtractor= TRACE
logging.level.com.itmuch=DEBUG
spring.http.encoding.charset=UTF-8
spring.http.encoding.enable=true
spring.http.encoding.force=true

eureka.client.serviceUrl.defaultZone=http://root:root@hadoop3:8761/eureka/

spring.application.name=xz-bigdata-springcloud-hbasequery
eureka.instance.prefer-ip-address=true

构建启动类

新建 com.hsiehchou.springcloud.hbase
构建 HbaseApplication 启动类

package com.hsiehchou.springcloud;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.netflix.eureka.server.EnableEurekaServer;

@SpringBootApplication
@EnableEurekaServer
public class HbaseQueryApplication
{
    public static void main( String[] args )
    {
        SpringApplication.run(HbaseQueryApplication.class, args);
    }
}

注册成功
说明注册成功

构建服务

构建Hbase服务

构建 com.hsiehchou.springcloud.hbase.controller

创建 HbaseBaseController

HbaseBaseController.java

package com.hsiehchou.springcloud.hbase.controller;

import com.hsiehchou.hbase.extractor.SingleColumnMultiVersionRowExtrator;
import com.hsiehchou.hbase.search.HBaseSearchService;
import com.hsiehchou.hbase.search.HBaseSearchServiceImpl;
import com.hsiehchou.springcloud.hbase.service.HbaseBaseService;
import org.apache.hadoop.hbase.client.Get;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.*;

import javax.annotation.Resource;
import java.io.IOException;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

@Controller
@RequestMapping(value="/hbase")
public class HbaseBaseController {

    private static Logger LOG = LoggerFactory.getLogger(HbaseBaseController.class);


    //注入 通过这个注解可以直接拿到HbaseBaseService这个的实例
    @Resource
    private HbaseBaseService hbaseBaseService;

    @ResponseBody
    @RequestMapping(value="/search/{table}/{rowkey}", method={RequestMethod.GET,RequestMethod.POST})
    public Set<String> search(@PathVariable(value = "table") String table,
                              @PathVariable(value = "rowkey") String rowkey){
        return hbaseBaseService.getSingleColumn(table,rowkey);
    }

    @ResponseBody
    @RequestMapping(value="/search1", method={RequestMethod.GET,RequestMethod.POST})
    public Set<String> search1( @RequestParam(name = "table") String table,
                                @RequestParam(name = "rowkey") String rowkey){
        //通过二级索引去找主关联表的rowkey 这个rowkey就是MAC
        return hbaseBaseService.getSingleColumn(table,rowkey);
    }

    @ResponseBody
    @RequestMapping(value = "/getHbase",method = {RequestMethod.GET,RequestMethod.POST})
    public Set<String> getHbase(@RequestParam(name="table") String table,
                                @RequestParam(name="rowkey") String rowkey){
        return hbaseBaseService.getSingleColumn(table, rowkey);
    }

    @ResponseBody
    @RequestMapping(value = "/getRelation",method = {RequestMethod.GET,RequestMethod.POST})
    public Map<String,List<String>> getRelation(@RequestParam(name = "field") String field,
                                                @RequestParam(name = "fieldValue") String fieldValue){
        return hbaseBaseService.getRealtion(field,fieldValue);
    }

    public static void main(String[] args) {
        HbaseBaseController hbaseBaseController = new HbaseBaseController();
        hbaseBaseController.getHbase("send_mail", "65497873@qq.com");
    }
}

构建 com.hsiehchou.springcloud.hbase.service

创建 HbaseBaseService

HbaseBaseService.java

package com.hsiehchou.springcloud.hbase.service;

import com.hsiehchou.hbase.entity.HBaseCell;
import com.hsiehchou.hbase.entity.HBaseRow;
import com.hsiehchou.hbase.extractor.MultiVersionRowExtrator;
import com.hsiehchou.hbase.extractor.SingleColumnMultiVersionRowExtrator;
import com.hsiehchou.hbase.search.HBaseSearchService;
import com.hsiehchou.hbase.search.HBaseSearchServiceImpl;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Put;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Service;

import javax.annotation.Resource;
import java.io.IOException;
import java.util.*;

@Service
public class HbaseBaseService {
    private static Logger LOG = LoggerFactory.getLogger(HbaseBaseService.class);

    @Resource
    private HbaseBaseService hbaseBaseService;

    /**
     * 获取hbase单列数据的多版本信息
     * @param field
     * @param rowkey
     * @return
     */
    public Set<String> getSingleColumn(String field,String rowkey){
        //从索引表中获取总关联表的rowkey,获取phone对应的多版本MAC
        Set<String> search = null;
        HBaseSearchService hBaseSearchService = new HBaseSearchServiceImpl();
        String table = "test:"+field;
        Get get = new Get(rowkey.getBytes());
        try {
            get.setMaxVersions(100);
        } catch (IOException e) {
            e.printStackTrace();
        }
        Set set = new HashSet<String>();
        SingleColumnMultiVersionRowExtrator singleColumnMultiVersionRowExtrator = new SingleColumnMultiVersionRowExtrator("cf".getBytes(), "phone_mac".getBytes(), set);

        try {
            search = hBaseSearchService.search(table, get, singleColumnMultiVersionRowExtrator);
            System.out.println(search.toString());
        } catch (IOException e) {
            e.printStackTrace();
        }
        return search;
    }

    /**
     *  获取单列多版本
     * @param table
     * @param rowkey
     * @param versions
     * @return
     */
    public Set<String> getSingleColumn(String table,String rowkey,int versions){
        Set<String> search = null;
        try {
            HBaseSearchService baseSearchService = new HBaseSearchServiceImpl();
            Get get = new Get(rowkey.getBytes());
            get.setMaxVersions(versions);
            Set set = new HashSet<String>();
            SingleColumnMultiVersionRowExtrator singleColumnMultiVersionRowExtrator = new SingleColumnMultiVersionRowExtrator("cf".getBytes(), "phone_mac".getBytes(), set);
            search = baseSearchService.search(table, get, singleColumnMultiVersionRowExtrator);
        } catch (IOException e) {
            LOG.error(null,e);
        }
        System.out.println(search);
        return search;
    }

    /**
     * 直接通过关联表字段值获取整条记录
     * hbase 二级查找
     * @param field
     * @param fieldValue
     * @return
     */
    public Map<String,List<String>> getRealtion(String field,String fieldValue){

        //第一步 从二级索引表中找到多版本的rowkey
        Map<String,List<String>> map = new HashMap<>();

        //首先查找索引表
        //查找的表名
        String table = "test:" + field;
        String indexRowkey = fieldValue;
        HbaseBaseService hbaseBaseService = new HbaseBaseService();
        Set<String> relationRowkeys = hbaseBaseService.getSingleColumn(table, indexRowkey, 100);

        //第二步 拿到二级索引表中得到的 主关联表的rowkey
        //对这些rowkey进行遍历 获取主关联表中rowkey对应的所有多版本数据

        //遍历relationRowkeys,将其封装成List<Get>
        List<Get> list = new ArrayList<>();
        relationRowkeys.forEach(relationRowkey->{
            //通过relationRowkey去找relation表中的所有信息
            Get get = new Get(relationRowkey.getBytes());
            try {
                get.setMaxVersions(100);
            } catch (IOException e) {
                e.printStackTrace();
            }
            list.add(get);
        });

        MultiVersionRowExtrator multiVersionRowExtrator = new MultiVersionRowExtrator();
        HBaseSearchService hBaseSearchService = new HBaseSearchServiceImpl();

        try {
            //<T> List<T> search(String tableName, List<Get> gets, RowExtractor<T> extractor) throws IOException;

            List<HBaseRow> search = hBaseSearchService.search("test:relation", list, multiVersionRowExtrator);
            search.forEach(hbaseRow->{
                Map<String, Collection<HBaseCell>> cellMap = hbaseRow.getCell();
                cellMap.forEach((key,value)->{
                    //把Map<String,Collection<HBaseCell>>转为Map<String,List<String>>
                    List<String> listValue = new ArrayList<>();
                    value.forEach(x->{
                        listValue.add(x.toString());
                    });
                    map.put(key,listValue);
                });
            });
        } catch (IOException e) {
            e.printStackTrace();
        }
        System.out.println(map.toString());
     return map;
    }

    public static void main(String[] args) {
        HbaseBaseService hbaseBaseService = new HbaseBaseService();
//        hbaseBaseService.getRealtion("send_mail","65494533@qq.com");
        hbaseBaseService.getSingleColumn("phone","18609765012");
    }
}

7、构建ES查询服务

使用jest API 是走的 HTTP 请求 9200端口
依赖如下:

<dependency>
    <groupId>io.searchbox</groupId>
    <artifactId>jest</artifactId>
    <version>6.3.1</version>
</dependency>

9200作为Http协议,主要用于外部通讯

9300作为Tcp协议,jar之间就是通过 tcp协议通讯

ES集群之间是通过9300进行通讯

新建xz_bigdata_springcloud_esquery

新建xz_bigdata_springcloud_esquery子项目

准备

新建 resources 配置文件目录

增加配置文件

application.properties

server.port=8003

logging.level.root=INFO
logging.level.org.hibernate=INFO
logging.level.org.hibernate.type.descriptor.sql.BasicBinder=TRACE
logging.level.org.hibernate.type.descriptor.sql.BasicExtractor= TRACE
logging.level.com.itmuch=DEBUG
spring.http.encoding.charset=UTF-8
spring.http.encoding.enable=true
spring.http.encoding.force=true

eureka.client.serviceUrl.defaultZone=http://root:root@hadoop3:8761/eureka/

spring.application.name=xz-bigdata-springcloud-esquery
eureka.instance.prefer-ip-address=true


#关闭EDES检测
management.health.elasticsearch.enabled=false

spring.elasticsearch.jest.uris=["http://192.168.116.201:9200"]


#全部索引
esIndexs=wechat,mail,qq

新建ES微服务启动类

ESqueryApplication.java

package com.hsiehchou.springcloud.es;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.client.discovery.EnableDiscoveryClient;
import org.springframework.cloud.netflix.eureka.server.EnableEurekaServer;
import org.springframework.cloud.openfeign.EnableFeignClients;

@SpringBootApplication
@EnableEurekaServer
@EnableDiscoveryClient
@EnableFeignClients
public class ESqueryApplication {
    public static void main(String[] args) {
        SpringApplication.run(ESqueryApplication.class,args);
    }
}

启动 Eureka ES 微服务

注册成功
说明注册成功

ES调用Hbase

构建 com.hsiehchou.springcloud.es.controller

创建 EsBaseController

package com.hsiehchou.springcloud.es.controller;

import com.hsiehchou.springcloud.es.feign.HbaseFeign;
import com.hsiehchou.springcloud.es.service.EsBaseService;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.ResponseBody;

import javax.annotation.Resource;
import java.util.List;
import java.util.Map;
import java.util.Set;

@Controller
@RequestMapping(value = "/es")
public class EsBaseController {


    @Value("${esIndexs}")
    private String esIndexs;

    @Resource
    private EsBaseService esBaseService;

    @Resource
    private HbaseFeign hbaseFeign;

    /**
     * 基础查询
     * @param indexName
     * @param typeName
     * @param sortField
     * @param sortValue
     * @param pageNumber
     * @param pageSize
     * @return
     */
    @ResponseBody
    @RequestMapping(value = "/getBaseInfo", method = {RequestMethod.GET, RequestMethod.POST})
    public List<Map<String, Object>> getBaseInfo(@RequestParam(name = "indexName") String indexName,
                                                 @RequestParam(name = "typeName") String typeName,
                                                 @RequestParam(name = "sortField") String sortField,
                                                 @RequestParam(name = "sortValue") String sortValue,
                                                 @RequestParam(name = "pageNumber") int pageNumber,
                                                 @RequestParam(name = "pageSize") int pageSize) {
        // 根据数据类型, 排序,分页
        // indexName typeName
        // sortField sortValue
        // pageNumber  pageSize
        return  esBaseService.getBaseInfo(indexName,typeName,sortField,sortValue,pageNumber,pageSize);
    }


    /**
     * 根据任意条件查找轨迹数据
     * @param field
     * @param fieldValue
     * @return
     */
    @ResponseBody
    @RequestMapping(value = "/getLocus", method = {RequestMethod.GET, RequestMethod.POST})
    public List<Map<String, Object>> getLocus(@RequestParam(name = "field") String field,
                                                 @RequestParam(name = "fieldValue") String fieldValue) {

        Set<String> macs = hbaseFeign.search1(field, fieldValue);
        System.out.println(macs.toString());
        // 根据数据类型, 排序,分页
        // indexName typeName
        // sortField sortValue
        // pageNumber  pageSize
        String mac = macs.iterator().next();

        return  esBaseService.getLocus(mac);
    }

    /**
     * 所有表数据总量
     * @return
     */
    @ResponseBody
    @RequestMapping(value="/getAllCount", method={RequestMethod.GET,RequestMethod.POST})
    public Map<String,Long> getAllCount(){
        Map<String, Long> allCount = esBaseService.getAllCount(esIndexs);
        System.out.println(allCount);
        return allCount;
    }

    @ResponseBody
    @RequestMapping(value="/group", method={RequestMethod.GET,RequestMethod.POST})
    public Map<String,Long> group(@RequestParam(name = "indexName") String indexName,
                                  @RequestParam(name = "typeName") String typeName,
                                  @RequestParam(name = "field") String field){
        return esBaseService.aggregation(indexName,typeName,field);
    }


    public static void main(String[] args){
        EsBaseController esBaseController = new EsBaseController();
        esBaseController.getLocus("phone","18609765432");
    }
}

构建 com.hsiehchou.springcloud.es.service

创建 EsBaseService

package com.hsiehchou.springcloud.es.service;

import com.hsiehchou.es.jest.service.JestService;
import com.hsiehchou.es.jest.service.ResultParse;
import io.searchbox.client.JestClient;
import io.searchbox.core.SearchResult;
import org.springframework.stereotype.Service;

import java.util.HashMap;
import java.util.List;
import java.util.Map;

@Service
public class EsBaseService {

    // 根据数据类型, 排序,分页
    // indexName typeName
    // sortField sortValue
    // pageNumber  pageSize
    public List<Map<String, Object>> getBaseInfo(String indexName,
                                                 String typeName,
                                                 String sortField,
                                                 String sortValue,
                                                 int pageNumber,
                                                 int pageSize) {
        //实现查询
        JestClient jestClient = null;
        List<Map<String, Object>> maps = null;
        try {
            jestClient = JestService.getJestClient();
            SearchResult search = JestService.search(jestClient,
                    indexName,
                    typeName,
                    "",
                    "",
                    sortField,
                    sortValue,
                    pageNumber,
                    pageSize);
            maps = ResultParse.parseSearchResultOnly(search);
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            JestService.closeJestClient(jestClient);
        }
        return maps;
    }


    // 传时间范围   比如你要查3天之内的轨迹
    // es中text的类型的可以直接查询,而keyword类型的必须带.keyword,例如,phone_mac.keyword
    public List<Map<String, Object>> getLocus(String mac){
        //实现查询
        JestClient jestClient = null;
        List<Map<String, Object>> maps = null;
        String[] includes = new String[]{"latitude","longitude","collect_time"};
        try {
            jestClient = JestService.getJestClient();
            SearchResult search = JestService.search(jestClient,
                    "",
                    "",
                    "phone_mac.keyword",
                    mac,
                    "collect_time",
                    "asc",
                    1,
                    2000,
                    includes);
            maps = ResultParse.parseSearchResultOnly(search);
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            JestService.closeJestClient(jestClient);
        }
        return maps;
    }


     public Map<String,Long> getAllCount(String esIndexs){

        Map<String,Long> countMap = new HashMap<>();
        JestClient jestClient = null;
        try {
            jestClient = JestService.getJestClient();
            String[] split = esIndexs.split(",");
            for (int i = 0; i < split.length; i++) {
                String index = split[i];
                Long count = JestService.count(jestClient, index, index);
                countMap.put(index,count);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }finally {
            JestService.closeJestClient(jestClient);
        }
        return countMap;
    }

    public Map<String,Long> aggregation(String indexName,String typeName,String field){

        JestClient jestClient = null;
        Map<String, Long> stringLongMap = null;
        try {
            jestClient = JestService.getJestClient();
            SearchResult aggregation = JestService.aggregation(jestClient, indexName, typeName, field);
            stringLongMap = ResultParse.parseAggregation(aggregation);
        } catch (Exception e) {
            e.printStackTrace();
        }finally {
            JestService.closeJestClient(jestClient);
        }
        return stringLongMap;
    }
}

这里用到了ES的大数据基础服务

轨迹查询

用到了 HBase 的服务,使用 Fegin
SpringCloud Feign

Feign 是一个声明式的伪Http客户端,它使得写Http客户端变得更简单。使用 Feign ,只需要创建一个接口并用注解的方式来配置它,即可完成对服务提供方的接口绑定服务调用客户端的开发量。

构建 com.hsiehchou.springcloud.es.fegin

创建 HbaseFeign

package com.hsiehchou.springcloud.es.feign;

import org.springframework.cloud.openfeign.FeignClient;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.ResponseBody;

import java.util.Set;

@FeignClient(name = "xz-bigdata-springcloud-hbasequery")
public interface HbaseFeign {

    @ResponseBody
    @RequestMapping(value="/hbase/search1", method=RequestMethod.GET)
    public Set<String> search1(@RequestParam(name = "table") String table,
                               @RequestParam(name = "rowkey") String rowkey);
}

8、微服务手动部署

Maven添加打包插件

 <build>
        <plugins>
            <plugin><!--打包依赖的jar包-->
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <configuration>
                    <outputDirectory>${project.build.directory}/lib</outputDirectory>
                    <excludeTransitive>false</excludeTransitive> <!-- 表示是否不包含间接依赖的包 -->
                    <stripVersion>false</stripVersion> <!-- 去除版本信息 -->
                </configuration>

                <executions>
                    <execution>
                        <id>copy-dependencies</id>
                        <phase>package</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                        <configuration>
                            <!-- 拷贝项目依赖包到lib/目录下 -->
                            <outputDirectory>${project.build.directory}/jars</outputDirectory>
                            <excludeTransitive>false</excludeTransitive>
                            <stripVersion>false</stripVersion>
                        </configuration>
                    </execution>
                </executions>
            </plugin>


            <!-- 打成jar包插件 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <version>2.4</version>
                <configuration>
                    <archive>
                        <!--
                        生成的jar中,不要包含pom.xml和pom.properties这两个文件
                    -->
                        <addMavenDescriptor>false</addMavenDescriptor>
                        <!-- 生成MANIFEST.MF的设置 -->
                        <manifest>
                            <!-- 为依赖包添加路径, 这些路径会写在MANIFEST文件的Class-Path下 -->
                            <addClasspath>true</addClasspath>
                            <classpathPrefix>jars/</classpathPrefix>
                            <!-- jar启动入口类-->
                            <mainClass>com.cn.hbase.mr.HbaseMr</mainClass>
                        </manifest>
                        <!--       <manifestEntries>
                                   &lt;!&ndash; 在Class-Path下添加配置文件的路径 &ndash;&gt;
                                   <Class-Path></Class-Path>
                               </manifestEntries>-->
                    </archive>
                    <outputDirectory>${project.build.directory}/</outputDirectory>
                    <includes>
                        <!-- 打jar包时,只打包class文件 -->
                        <include>**/*.class</include>
                        <include>**/*.properties</include>
                        <include>**/*.yml</include>
                    </includes>
                </configuration>
            </plugin>
        </plugins>
    </build>

因为微服务依赖 xz_bigdata2 所以先打包xz_bigdata2

修改配置文件

defaultZone: http://root:root@hadoop3:8761/eureka/

将注册中心 IP 改为部署服务器的IP
微服务同理

上面给出的配置文件已经修改好了

部署

  1. 先部署Erueka服务中心
    新建/usr/chl/springcloud/eureka

部署地方

上传jars 和jar

eureka

  1. 启动服务中心
    eureka服务注册中心启动
nohup java -cp /usr/chl/springcloud/eureka/xz_bigdata_springcloud_eureka-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.eureka.EurekaApplication &

查看日志

tail -f nohup.out
  1. 部署esquery
    esquery微服务启动
nohup java -cp /usr/chl/springcloud/esquery/xz_bigdata_springcloud_esquery-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.es.ESqueryApplication &
  1. 部署hbasequery
    hbasequery微服务启动
nohup java -cp /usr/chl/springcloud/hbasequery/xz_bigdata_springcloud_hbasequery-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.HbaseQueryApplication &

9、执行

  1. hadoop3:8002/hbase/getRelation?field=phone&fieldValue=18609765012
    1

  2. hadoop3:8002/hbase/search1?table=phone&rowkey=18609765012
    2

  3. hadoop3:8002/hbase/getHbase?table=send_mail&rowkey=65497873@qq.com
    3

  4. hadoop3:8002/hbase/getHbase?table=phone&rowkey=18609765012
    4

  5. hadoop3:8002/hbase/search/phone/18609765012
    5

  6. hadoop3:8003/es/getAllCount
    6

  7. hadoop3:8003/es/getBaseInfo
    7

  8. hadoop3:8003/es/getLocus
    8

  9. hadoop3:8003/es/group
    9

十三、附录

1、测试数据

mail_source1_1111101.txt

000000000000011    000000000000011    23.000011    24.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300088    65497873@qq.com    1789090763    11111111@qq.com    1789097863    今天出去打球吗    send
000000000000011    000000000000011    24.000011    25.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300085    65497873@qq.com    1789090764    22222222@qq.com    1789097864    今天出去打球吗    send
000000000000011    000000000000011    23.000011    24.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300088    65497873@qq.com    1789090763    33333333@qq.com    1789097863    今天出去打球吗    send
000000000000011    000000000000011    24.000011    25.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300085    65497873@qq.com    1789090764    44444444@qq.com    1789097864    今天出去打球吗    send
000000000000000    000000000000000    23.000001    24.000001    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305988    1323243@qq.com    1789098763    43432543@qq.com    1789098863    今天出去打球吗    send
000000000000000    000000000000000    24.000001    25.000001    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305985    1323243@qq.com    1789098764    43432543@qq.com    1789098864    今天出去打球吗    send
000000000000000    000000000000000    23.000001    24.000001    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305988    1323243@qq.com    1789098763    43432543@qq.com    1789098863    今天出去打球吗    send
000000000000000    000000000000000    24.000001    25.000001    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305985    1323243@qq.com    1789098764    43432543@qq.com    1789098864    今天出去打球吗    send

qq_source1_1111101.txt

000000000000000    000000000000000    23.000000    24.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305988    andiy    18609765432    judy            1789098762
000000000000000    000000000000000    24.000000    25.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305985    andiy    18609765432    judy            1789098763
000000000000000    000000000000000    23.000000    24.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305988    andiy    18609765432    judy            1789098762
000000000000000    000000000000000    24.000000    25.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305985    andiy    18609765432    judy            1789098763
000000000000011    000000000000011    23.000011    24.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300388    xz    18609765012    ls            1789000653
000000000000011    000000000000011    24.000011    25.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300545    xz    18609765012    ls            1789000343
000000000000011    000000000000011    23.000011    24.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300658    xz    18609765012    ls            1789000542
000000000000011    000000000000011    24.000011    25.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300835    xz    18609765012    ls            1789000263
000000000000011    000000000000011    23.000021    24.000031    1c-31-5d-b1-6f-3f    3y-5g-g6-du-bv-2f    32109246    1557300388    xz    18609765016    ls            1789001653
000000000000011    000000000000011    24.000021    25.000031    1c-31-5d-b1-6f-3f    3y-5g-g6-du-bv-2f    32109246    1557302235    xz    18609765016    ls            1789001343
000000000000011    000000000000011    23.000021    24.000031    1c-31-5d-b1-6f-3f    3y-5g-g6-du-bv-2f    32109246    1557303658    xz    18609765016    ls            1789001542
000000000000011    000000000000011    24.000021    25.000031    1c-31-5d-b1-6f-3f    3y-5g-g6-du-bv-2f    32109246    1557303835    xz    18609765016    ls            1789001263
000000000000011    000000000000011    23.000031    24.000041    4c-6f-c7-3d-a4-3d    9g-gd-3h-3k-ld-3f    32109246    1557300001    xz    18609765014    ls            1789050653
000000000000011    000000000000011    24.000031    25.000051    7c-8e-d4-a6-3d-5c    54-hg-gi-yx-ef-ge    32109246    1557300005    xz    18609765015    ls            1789070343
000000000000011    000000000000011    23.000031    24.000061    8c-g1-ed-7b-5f-1b    47-fy-vv-hs-ue-fd    32109246    1557300008    xz    18609765017    ls            1789080542
000000000000011    000000000000011    24.000031    25.000071    0c-76-2a-b1-3c-1a    f5-nw-hf-ud-ht-ea    32109246    1557300115    xz    18609765010    ls            1789082263

wechat_source1_1111101.txt

000000000000000    000000000000000    23.000000    24.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305988    andiy    18609765432    judy            1789098762
000000000000000    000000000000000    24.000000    25.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305985    andiy    18609765432    judy            1789098763
000000000000000    000000000000000    23.000000    24.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305988    andiy    18609765432    judy            1789098762
000000000000000    000000000000000    24.000000    25.000000    aa-aa-aa-aa-aa-aa    bb-bb-bb-bb-bb-bb    32109231    1557305985    andiy    18609765432    judy            1789098763
000000000000011    000000000000011    23.000011    24.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300388    xz    18609765012    ls            1789000653
000000000000011    000000000000011    24.000011    25.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300545    xz    18609765012    ls            1789000343
000000000000011    000000000000011    23.000011    24.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300658    xz    18609765012    ls            1789000542
000000000000011    000000000000011    24.000011    25.000011    1c-41-cd-b1-df-3f    1b-3d-zg-fg-ef-1b    32109246    1557300835    xz    18609765012    ls            1789000263
000000000000011    000000000000011    23.000021    24.000031    1c-31-5d-b1-6f-3f    3y-5g-g6-du-bv-2f    32109246    1557300388    xz    18609765016    ls            1789001653
000000000000011    000000000000011    24.000021    25.000031    1c-31-5d-b1-6f-3f    3y-5g-g6-du-bv-2f    32109246    1557302235    xz    18609765016    ls            1789001343
000000000000011    000000000000011    23.000021    24.000031    1c-31-5d-b1-6f-3f    3y-5g-g6-du-bv-2f    32109246    1557303658    xz    18609765016    ls            1789001542
000000000000011    000000000000011    24.000021    25.000031    1c-31-5d-b1-6f-3f    3y-5g-g6-du-bv-2f    32109246    1557303835    xz    18609765016    ls            1789001263
000000000000011    000000000000011    23.000031    24.000041    4c-6f-c7-3d-a4-3d    9g-gd-3h-3k-ld-3f    32109246    1557300001    xz    18609765014    ls            1789050653
000000000000011    000000000000011    24.000031    25.000051    7c-8e-d4-a6-3d-5c    54-hg-gi-yx-ef-ge    32109246    1557300005    xz    18609765015    ls            1789070343
000000000000011    000000000000011    23.000031    24.000061    8c-g1-ed-7b-5f-1b    47-fy-vv-hs-ue-fd    32109246    1557300008    xz    18609765017    ls            1789080542
000000000000011    000000000000011    24.000031    25.000071    0c-76-2a-b1-3c-1a    f5-nw-hf-ud-ht-ea    32109246    1557300115    xz    18609765010    ls            1789082263

2、Kafka

创建topic,1个副本3个分区
kafka-topics –zookeeper hadoop1:2181 –topic chl_test7 –create –replication-factor 1 –partitions 3

删除topic
kafka-topics –zookeeper hadoop1:2181 –delete –topic chl_test7

列出所有的topic
kafka-topics –zookeeper hadoop1:2181 –list

消费
kafka-console-consumer –bootstrap-server hadoop1:9092 –topic chl_test7 –from-beginning

3、kafka2es

启动sparkstreaming任务

spark-submit --master yarn-cluster --num-executors 1 --driver-memory 500m --executor-memory 1g --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ' ' ',') --class com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar chl_test7 chl_test7
spark-submit 
--master yarn-cluster    //集群启动
--num-executors 1        //分配多少个进程
--driver-memory 500m  //driver内存
--executor-memory 1g //进程内存
--executor-cores 1       //开多少个核,线程
--jars $(echo /usr/chl/spark8/jars/*.jar | tr ' ' ',') //加载jar
--class com.hsiehchou.spark.streaming.kafka.kafka2es.Kafka2esStreaming //执行类 /usr/chl/spark8/xz_bigdata_spark-1.0-SNAPSHOT.jar //包的位置

4、Yarn

将yarn的执行日志输出
yarn logs -applicationId application_1561627166793_0002 > log.log

查看日志
more log.log

cat log.log

5、CDH的7180打不开

查看cloudera-scm-server状态
service cloudera-scm-server status

查看cloudera-scm-server 日志
cat /var/log/cloudera-scm-server/cloudera-scm-server.log

重启cloudera-scm-server
service cloudera-scm-server restart

6、CDH的jdk设置—重要

/usr/local/jdk1.8

7、预警

spark-submit --master local[1] --num-executors 1 --driver-memory 300m --executor-memory 500m --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ' ' ',') --class com.hsiehchou.spark.streaming.kafka.warn.WarningStreamingTask /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar

8、Kibana的DEV Tools

GET _search
{
  "query": {
    "match_all": {}
  }
}

GET  _cat/indices

DELETE tanslator_test1111

DELETE qq
DELETE wechat
DELETE mail

GET wechat

GET mail

GET _search

GET mail/_search

GET mail/_mapping

PUT mail

PUT mail/mail/_mapping
{
  "_source": {
    "enabled": true
  },
  "properties": {
    "imei":{"type": "keyword"},
    "imsi":{"type": "keyword"},
    "longitude":{"type": "double"},
    "latitude":{"type": "double"},
    "phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "collect_time":{"type": "long"},
    "send_mail":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "send_time":{"type": "long"},
    "accept_mail":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "accept_time":{"type": "long"},
    "mail_content":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "mail_type":{"type": "keyword"},
     "id":{"type": "keyword"},
    "table":{"type": "keyword"},
    "filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
  }
}

GET qq/_search

GET qq/_mapping

PUT qq

PUT qq/qq/_mapping
{
  "_source": {
    "enabled": true
  },
  "properties": {
    "imei":{"type": "keyword"},
    "imsi":{"type": "keyword"},
    "longitude":{"type": "double"},
    "latitude":{"type": "double"},
    "phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "collect_time":{"type": "long"},
    "username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "phone":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "object_username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "send_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "accept_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "message_time":{"type": "long"},
    "id":{"type": "keyword"},
    "table":{"type": "keyword"},
    "filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
  }
}

GET wechat/_search

GET wechat/_mapping

PUT wechat

PUT wechat/wechat/_mapping
{
  "_source": {
    "enabled": true
  },
  "properties": {
    "imei":{"type": "keyword"},
    "imsi":{"type": "keyword"},
    "longitude":{"type": "double"},
    "latitude":{"type": "double"},
    "phone_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_mac":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "device_number":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "collect_time":{"type": "long"},
    "username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "phone":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "object_username":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "send_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "accept_message":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "message_time":{"type": "long"},
    "id":{"type": "keyword"},
    "table":{"type": "keyword"},
    "filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}},
    "absolute_filename":{"type": "text","fields": {"keyword": {"ignore_above": 256,"type": "keyword"}}}
  }
}

9、Hive

kafka写入hive

spark-submit --master local[1] --num-executors 1 --driver-memory 300m --executor-memory 500m --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ' ' ',') --class com.hsiehchou.spark.streaming.kafka.kafka2hdfs.Kafka2HiveTest /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar
show tables;

hdfs dfs -ls /apps/hive/warehouse/external

hdfs dfs -rm -r /apps/hive/warehouse/external/mail

drop table mail;

desc qq;

select * from qq limit 1;

注意了:cdh的hive版本跟其对应的spark版本不一致的话此处执行不了
select count(*) from qq;

合并小文件

crontab -e

0 1 * * * spark-submit --master local[1] --num-executors 1 --driver-memory 300m --executor-memory 500m --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ' ' ',') --class com.hsiehchou.spark.streaming.kafka.kafka2hdfs.CombineHdfs /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar

定时任务crontab

10、Zookeeper

启动zookeeper客户端
zookeeper-client

清除消费者
rmr /consumers/WarningStreamingTask2/offsets

rmr /consumers/Kafka2HiveTest/offsets

rmr /consumers/DataRelationStreaming1/offsets

11、Hbase

spark-submit --master local[1] --num-executors 1 --driver-memory 500m --executor-memory 1g --executor-cores 1 --jars $(echo /usr/chl/spark7/jars/*.jar | tr ' ' ',') --class com.hsiehchou.spark.streaming.kafka.kafka2hbase.DataRelationStreaming /usr/chl/spark7/xz_bigdata_spark-1.0-SNAPSHOT.jar
hbase shell

list

create 't1','cf'

desc 't1'

put 't1','aa-aa-aa-aa-aa-aa','cf:qq','66666666'

put 't1','aa-aa-aa-aa-aa-aa','cf:weixin','weixin1'

put 't1','aa-aa-aa-aa-aa-aa','cf:mail','66666@qq.com'

scan 't1'

将表变成多版本
alter 't1',{NAME=>'cf',VERSIONS=>50}

put 't1','aa-aa-aa-aa-aa-aa','cf:qq','77777777'

get 't1','aa-aa-aa-aa-aa-aa',{COLUMN=>'cf',VERSIONS=>10}

put 't1','aa-aa-aa-aa-aa-aa','cf:qq','55555555'

put 't1','aa-aa-aa-aa-aa-aa','cf:qq','88888888',1290300544

执行DataRelationStreaming
scan 'test:relation'

get 'test:username','andiy'

scan 'test:relation'

mail 
改mac 邮箱

get  'test:relation','',{COLUMN=>'cf',VERSIONS=>10}

disable 'test:imei'
drop 'test:imei'

disable 'test:imsi'
drop 'test:imsi'

disable 'test:phone'
drop 'test:phone'

disable 'test:phone_mac'
drop 'test:phone_mac'

disable 'test:relation'
drop 'test:relation'

disable 'test:send_mail'
drop 'test:send_mail'

disable 'test:username'
drop 'test:username'

12、SpringCloud

eureka服务注册中心启动

nohup java -cp /usr/chl/springcloud/eureka/xz_bigdata_springcloud_eureka-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.eureka.EurekaApplication &

查看日志

tail -f nohup.out

esquery微服务启动

nohup java -cp /usr/chl/springcloud/esquery/xz_bigdata_springcloud_esquery-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.es.ESqueryApplication &

hbasequery微服务启动

nohup java -cp /usr/chl/springcloud/hbasequery/xz_bigdata_springcloud_hbasequery-1.0-SNAPSHOT.jar com.hsiehchou.springcloud.HbaseQueryApplication &

文章作者: 谢舟
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 谢舟 !
  目录