编写Scala代码，使用Spark讲Mysql数据表中的数据抽取到Hive的ODS层

scala spark mysql 2023-09-22 17:09:03 758人浏览安东尼

摘要

编写Scala代码，使用spark讲Mysql数据表中的数据抽取到Hive的ODS层抽取mysql的metast库中Production表的全量数据进入Hive的ods库中表production，字

编写Scala代码，使用spark讲Mysql数据表中的数据抽取到Hive的ODS层

抽取mysql的metast库中Production表的全量数据进入Hive的ods库中表production，字段排序、类型不变，同时添加静态分区，分区字段类型为String，且值为当前日期的前一天日期（分区字段格式为yyyyMMdd）。

使用idea创建Maven项目

配置pom文件

<project xmlns="Http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">  <modelVersion>4.0.0</modelVersion>  <groupId>com.tledu</groupId>  <artifactId>llll</artifactId>  <version>1.0-SNAPSHOT</version>  <name>${project.artifactId}</name>  <description>My wonderfull scala app</description>  <inceptionYear>2018</inceptionYear>  <licenses>    <license>      <name>My License</name>      <url>http://....</url>      <distribution>repo</distribution>    </license>  </licenses>   <properties>    <maven.compiler.source>1.8</maven.compiler.source>    <maven.compiler.target>1.8</maven.compiler.target>    <encoding>UTF-8</encoding>    <scala.version>2.11.11</scala.version>    <scala.compat.version>2.11</scala.compat.version>    <spec2.version>4.2.0</spec2.version>  </properties>   <dependencies>    <dependency>      <groupId>org.scala-lang</groupId>      <artifactId>scala-library</artifactId>      <version>${scala.version}</version>    </dependency>      <dependency>      <groupId>org.apache.spark</groupId>      <artifactId>spark-core_${scala.compat.version}</artifactId>      <version>2.3.2</version>      <scope>provided</scope>    </dependency>     <dependency>      <groupId>org.apache.spark</groupId>      <artifactId>spark-sql_${scala.compat.version}</artifactId>      <version>2.3.2</version>      <scope>provided</scope>    </dependency>     <dependency>      <groupId>org.apache.spark</groupId>      <artifactId>spark-hive_2.11</artifactId>      <version>2.0.2</version>      <scope>provided</scope>    </dependency>     <dependency>      <groupId>mysql</groupId>      <artifactId>mysql-connector-java</artifactId>      <version>8.0.23</version>    </dependency>       <!-- Test -->    <dependency>      <groupId>junit</groupId>      <artifactId>junit</artifactId>      <version>4.12</version>      <scope>test</scope>    </dependency>    <dependency>      <groupId>org.scalatest</groupId>      <artifactId>scalatest_${scala.compat.version}</artifactId>      <version>3.0.5</version>      <scope>test</scope>    </dependency>    <dependency>      <groupId>org.specs2</groupId>      <artifactId>specs2-core_${scala.compat.version}</artifactId>      <version>${spec2.version}</version>      <scope>test</scope>    </dependency>    <dependency>      <groupId>org.specs2</groupId>      <artifactId>specs2-junit_${scala.compat.version}</artifactId>      <version>${spec2.version}</version>      <scope>test</scope>    </dependency>  </dependencies>   <build>    <sourceDirectory>src/main/scala</sourceDirectory>    <testSourceDirectory>src/test/scala</testSourceDirectory>    <plugins>      <plugin>        <!-- see http://davidb.GitHub.com/scala-maven-plugin -->        <groupId>net.alchim31.maven</groupId>        <artifactId>scala-maven-plugin</artifactId>        <version>3.3.2</version>        <executions>          <execution>            <Goals>              <goal>compile</goal>              <goal>testCompile</goal>            </goals>            <configuration>              <args>                <arg>-dependencyfile</arg>                <arg>${project.build.directory}/.scala_dependencies</arg>              </args>            </configuration>          </execution>        </executions>      </plugin>      <plugin>        <groupId>org.apache.maven.plugins</groupId>        <artifactId>maven-surefire-plugin</artifactId>        <version>2.21.0</version>        <configuration>          <!-- Tests will be run with scalatest-maven-plugin instead -->          <skipTests>true</skipTests>        </configuration>      </plugin>      <plugin>        <groupId>org.scalatest</groupId>        <artifactId>scalatest-maven-plugin</artifactId>        <version>2.0.0</version>        <configuration>          <reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>          <junitxml>.</junitxml>          <filereports>TestSuiteReport.txt</filereports>          <!-- Comma separated list of JUnit test class names to execute -->          <jUnitClasses>samples.AppTest</jUnitClasses>        </configuration>        <executions>          <execution>            <id>test</id>            <goals>              <goal>test</goal>            </goals>          </execution>        </executions>      </plugin>       <plugin>        <artifactId>maven-assembly-plugin</artifactId>        <configuration>          <descriptorRefs>            <descriptorRef>jar-with-dependencies</descriptorRef>          </descriptorRefs>        </configuration>        <executions>          <execution>            <id>make-assembly</id>            <phase>package</phase>            <goals>              <goal>assembly</goal>            </goals>          </execution>        </executions>      </plugin>    </plugins>  </build></project>

导入scala
我这里演示用的是Unbanto，操作步骤一样

点击＋号去添加，这里注意scala版本号要与pom配置文件中的一致

在这里插入图片描述

在这里插入图片描述
创建一个scala目录并将它标记为根目录，在scala里新建一个object

编程过程如下

object demo01 {  def getYesterday(): String = {    val dateFORMat: SimpleDateFormat = new SimpleDateFormat("yyyyMMdd")    val cal: Calendar = Calendar.getInstance()    cal.add(Calendar.DATE, -1)    dateFormat.format(cal.getTime())  }  def main(args: Array[String]): Unit = {    //source start    val spark = SparkSession.builder()      .master("local[1]")      .config("spark.sql.parquet.writeLegacyFormat", true)      //100个分区，执行完之后只有一个分区；      .config("spark.sql.sources.partitionOverwriteMode", "dynamic")//动态分区      .config("spark.sql.legacy.parquet.int96RebaseModeInWrite","LEGACY")      .config("hive.exec.dynamic.partition.mode", "nonstrict")      .enableHiveSupport().getOrCreate()    //spark连接mysql    val url = s"jdbc:mysql://IP地址:3306/shtd_industry?useUnicode=true&characterEncoding=utf8&useSSL=false"       val readerCustomerInf = spark.read.format("jdbc")      .option("url", url)      .option("driver", "com.mysql.jdbc.Driver")      .option("user", "root")      .option("passWord", "123456")      .option("dbtable", "数据库表名")      .load() //转换为DataFrame     //source end    //增加分区字段   etl    val addPtDF = readerCustomerInf.withColumn("etl_date", lit(getYesterday()))    val tableName = "hive表名"    //切换hive的数据库    import spark.sql    sql("use ods")    //sink    addPtDF.write.mode("overwrite").partitionBy("etl_date").saveAsTable(tableName).formatted("orc")    spark.table(tableName).show()  }}

将编写好的代码打包发送到linux中

在这里插入图片描述

在集群上上传你打好的包

通常使用rz指令上传

可以写一个脚本运行你的包

vi spark.sh

在这里插入图片描述

/opt/module/spark-3.1.1-yarn/bin/spark-submit \--class 要运行的类名 \--master yarn \--deploy-mode client \--driver-memory 2g \--executor-memory 1g \--executor-cores 2 \/jar包的地址/这里是你的jar包

在这里插入图片描述

保存退出

sh spark.sh 运行脚本
Mysql数据就导入HIVE数据库的ods层中了

来源地址：https://blog.csdn.net/qq_41289004/article/details/127737908

您可能感兴趣的文档:

点击免费下载>>软考高级考试备考技巧/历年真题/备考精华资料

--结束END--

本文标题: 编写Scala代码，使用Spark讲Mysql数据表中的数据抽取到Hive的ODS层

本文链接: https://www.lsjlt.com/news/415621.html(转载时请注明来源链接)

有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341

本篇文章演示代码以及资料文档资料下载

下载Word文档到电脑，方便收藏和打印～

下载Word文档

去做题

猜你喜欢

编写Scala代码，使用Spark讲Mysql数据表中的数据抽取到Hive的ODS层

编写Scala代码，使用Spark讲Mysql数据表中的数据抽取到Hive的ODS层抽取MySQL的metast库中Production表的全量数据进入Hive的ods库中表production，字...

99+

2023-09-22

scala spark mysql
大数据之使用Spark全量抽取MySQL的数据到Hive数据库

文章目录前言一、读题分析二、使用步骤 1.导入配置文件到pom.xml 2.代码部分三、重难点分析总结前言本题来源于全国职业技能大赛之大数据技术赛项赛题-离线数据处理-数据抽取（其他暂不透露）题目：编写S...

99+

2023-09-17

hive spark 大数据数据库 scala
大数据之使用Spark增量抽取MySQL的数据到Hive数据库（1）

目录前言题目：一、读题分析二、处理过程 1.采用SparkSQL使用max函数找到最大的日期然后转换成时间类型在变成字符串 2.这里提供除了SQL方法外的另一种过滤不满足条件的方法三、重难点分析总结前言本题来源于全国职业...

99+

2023-10-18

大数据 hive spark mysql scala
Centos7中MySQL数据库使用mysqldump进行每日自动备份的编写

一、需求说明：数据库的备份，对于生产环境来说尤为重要，数据库的备份分为物理备份和逻辑备份。物理备份：使用相关的复制命令直接将数据库的数据目录中的数据复制一份货多分副本，常使用工具...

99+

2024-04-02
如何编写PHP脚本并在其中使用ORDER BY子句对MySQL表的数据进行排序？

我们可以在 PHP 函数 mysql_query() 中使用 ORDER BY 子句的类似语法。该函数用于执行 SQL 命令，稍后可以使用另一个 PHP 函数 - mysql_fetch_array() 来获取所有选定的数据。为了说明这一点...

99+

2023-10-22
Centos7中MySQL数据库怎么使用mysqldump进行每日自动备份的编写

这篇文章主要介绍“Centos7中MySQL数据库怎么使用mysqldump进行每日自动备份的编写”，在日常操作中，相信很多人在Centos7中MySQL数据库怎么使用mysqldump进行每日自动备份的编写问题上存在疑惑，小编查阅了各式资...

99+

2023-06-20
python使用飞书开发平台api，爬取多维表格或者电子表格的数据，并写到本地文件

使用飞书开发平台提供的api接口，去爬取多维表格的数据，并保存在本地，也可以爬取电子表格的数据，但是电子表格的相关api在我使用的时候是不提供调试的，可以将电子表格转变成多维表格第一次看这种api文档来书写代码完成业务逻辑，并且我本身是用的...

99+

2023-09-30

python json
我们如何使用带有“FIELDS TERMINATED BY”选项的 MySQL LOAD DATA INFILE 语句将数据从文本文件导入到 MySQL 表中？

当我们想要导入 MySQL 表的文本文件的值由逗号 (,) 或任何其他分隔符（如冒号 (:)）分隔时，应使用“FIELDS TERMINATED BY”选项，可以通过下面的例子来理解 -例子假设我们有以下数据，用分号(;)分隔，在我们想要导...

99+

2023-10-22
在IDEA中配置MySQL数据库连接以及在使用mybatis时设置sql语句的代码提示功能

在IDEA中配置MySQL数据库连接以及在使用mybatis时设置sql语句的代码提示功能一：在IDEA中配置MySQL数据库连接第一步：在IDEA右侧区域有database选项，点击进去第二步：database -> data ...

99+

2023-09-02

mybatis intellij-idea java sql 数据库