spark实例1--wordCount-白红宇

spark实例1--wordCount

阅读量：6621 次

发布时间：2019-06-25

本文共 1933 字，大约阅读时间需要 6 分钟。

IDE: scala版的 Eclipse

scala version：2.10.4

spark：1.1.1

文件内容：

hello world

hello word

world word hello

1、新建scala工程

2、引入spark的jar包

3、代码

import org.apache.spark.SparkConf

import org.apache.spark.SparkContext

import org.apache.spark.SparkContext._

object WordCount {

def main(args:Array[String]){

val conf=new SparkConf().setAppName("word Count").setMaster("local")

val sc=new SparkContext(conf)

val textFile=sc.textFile("test.txt")

val mapRdd = textFile.flatMap(line=>line.split(",")).map(x=>(x,1)).reduceByKey(_+_)

mapRdd.collect().foreach(println)

}

4、运行

在spark的bin目录里运行spark-shell，待spark启动后，

运行结果：

(hello,3)

(word,2)

(world,2)

代码分析：

1、import org.apache.spark.SparkContext._ 这句的作用引入隐式转换

不然会出现value reduceByKey is not a member of org.apache.spark.rdd.RDD[(String, Int)]

2、map 和flatMap区别

看3个代码

textFile.map(_.split(",")).collect().foreach(println)

textFile.map(_.split(",")).collect().foreach(x=> println(x.mkString(",")))

textFile.flatMap(_.split(",")).collect().foreach(println)

输出分别为：

[Ljava.lang.String;@736caf7a

[Ljava.lang.String;@4ce7fffa

[Ljava.lang.String;@497486b3　　

hello,world

hello,word

world,word,hello　　

hello

world

hello

word

world

word

hello

从代码1和代码2可以看出map的结果应该是：

Array(Array("hello","world"),Array("hello","word"),Array("world","word","hello"))

flatMap的输出结果应该是

Array("hello","world","hello","word","world","word","hello")

flatMap就是在map基础上平铺展开

[Ljava.lang.String;@736caf7a这个是什么类型，在scala的命令行界面输入

val a=Array("hello")

println(a) 输出 [Ljava.lang.String;@cc4a0dd

3、整个程序的流程中各个环节的输出

flatMap--》map--》reduceByKey

Array("hello","world","hello","word","world","word","hello")---》

Array[Strng,Int](("hello",1),("world",1),("hello",1),("word",1),("world",1),("word",1),("hello",1))-->

Array[String,Int](("hello",3),("world",2),("word",2))

4、如果求出现次数最多的单词

flatMap--》map--》reduceByKey--》reduce

val maxNum= mapRdd.reduce((a,b)=>if (a._2>b._2) a else b)

println(maxNum)

输出("hello",3)

转载于:https://www.cnblogs.com/360Spark/p/4715360.html

你可能感兴趣的文章

Angular学习笔记（一） - 之安装教程

查看>>

Spring Websocket实现文本、图片、声音、文件下载及推送、接收及显示(集群模式)...

最严新规发布网络短视频平台该如何降低违规风险？ ...

查看>>

云服务器ECS出现速度变慢以及突然断开怎么办？

查看>>

208亿背后的“秘密”

查看>>

Android系统自带样式（android:theme)解析

作为一名合格的JAVA架构师需要点亮哪些技能树？

查看>>

为什么短视频会让人刷不停？背后也许用了这套技术

查看>>

Kubernetes 在知乎上的应用

查看>>

读C#开发实战1200例子记录-2017年8月14日11:20:38获取汉字编码值

查看>>

Fescar 发布 0.3.1 版本，支持 ZooKeeper 注册中心

查看>>

网站优化中四个常见的优化难题及解决方法!

查看>>

【死磕 Spring】----- IOC 之解析 bean 标签：BeanDefinition

查看>>