学习的资料是官网的Programming Guide

https://spark.apache.org/docs/latest/graphx-programming-guide.html

 首先是GraphX的简介

GraphX是Spark中专门负责图和图并行计算的组件。

GraphX通过引入了图形概念来继承了Spark RDD:一个连接节点和边的有向图

为了支持图计算,GraphX引入了一些算子: subgraphjoinVertices, and aggregateMessages

和 Pregel API,此外还有一些algorithmsbuilders 来简化图分析任务。

 

关于构建 节点Vertex边Edge

1.如果需要将节点定义成一个类

package graphx

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
import org.graphstream.graph.implementations.{AbstractEdge, SingleGraph, SingleNode}

/**
  * Created by common on 18-1-22.
  */

// 抽象节点
class VertexProperty()
// User节点
case class UserProperty(val name: String) extends VertexProperty
// Product节点
case class ProductProperty(val name: String, val price: Double) extends VertexProperty

object GraphxLearning {

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setAppName("GraphX").setMaster("local")
    val sc = new SparkContext(conf)

    // The graph might then have the type:
    var graph: Graph[VertexProperty, String] = null

  }
}

和节点一样,边也可以定义成一个class,同时Graph类需要和定义的节点和边的类型相对应

class Graph[VD, ED] {    // VD表示节点类型,ED表示边类型
  val vertices: VertexRDD[VD]
  val edges: EdgeRDD[ED]
}

 

2.如果节点的类型比较简单,例如只是一个String或者(String,String),就不需要定义成一个类

package graphx

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
import org.graphstream.graph.implementations.{AbstractEdge, SingleGraph, SingleNode}

/**
  * Created by common on 18-1-22.
  */
object GraphxLearning {

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setAppName("GraphX").setMaster("local")
    val sc = new SparkContext(conf)

    // Create an RDD for the vertices
    val users: RDD[(VertexId, (String, String))] =
      sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")),
        (5L, ("franklin", "prof")), (2L, ("istoica", "prof"))))
    // Create an RDD for edges
    val relationships: RDD[Edge[String]] =
      sc.parallelize(Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"),
        Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi")))
    //Define a default user in case there are relationship with missing user
    val defaultUser = ("John Doe", "Missing")

    // 使用多个RDDs建立一个Graph,Graph的类型分别是节点加上边的类型,有两种节点,一种有ID,一种没有
    val srcGraph: Graph[(String, String), String] = Graph(users, relationships, defaultUser)

  }
}