更多代码请见:​​https://github.com/xubo245/SparkLearning​

Adam学习2之adam-shell使用

环境:

集群:Ubuntu14.04 +Spark 1.5.2 +scala2.10

//本地:window7 64 +eclipse4.3.2+scala2.10.4



代码:

import org.bdgenomics.adam.rdd.ADAMContext
import org.bdgenomics.adam.projections.{AlignmentRecordField, Projection}

val ac = new ADAMContext(sc)
// Load alignments from disk
val reads = ac.loadAlignments("/xubo/adam/output/small.adam",
projection = Some(
Projection(
AlignmentRecordField.sequence,
AlignmentRecordField.readMapped,
AlignmentRecordField.mapq
)
)
)

// Generate, count and sort 21-mers
val kmers =reads.flatMap(_.getSequence.sliding(21).map(k => (k, 1L))).reduceByKey(_ + _).map(_.swap).sortByKey(ascending = false)


// Print the top 10 most common 21-mers
kmers.take(10).foreach(println)



路径:hadoop@Mcnode1:~/cloud/adam/xubo/testAdam34/kmer.scala 


运行结果:

hadoop@Mcnode1:~/cloud/adam/xubo/testAdam34$ adam-shell -i kmer.scala 
-i kmer.scala --jars /home/hadoop/cloud/adam/adam-cli/target/adam-cli_2.10-0.18.3-SNAPSHOT.jar
Using SPARK_SHELL=/home/hadoop/cloud/spark-1.5.2//bin/spark-shell
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.5.2
/_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_79)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.
Loading kmer.scala...
import org.bdgenomics.adam.rdd.ADAMContext
import org.bdgenomics.adam.projections.{AlignmentRecordField, Projection}
ac: org.bdgenomics.adam.rdd.ADAMContext = org.bdgenomics.adam.rdd.ADAMContext@31264ff
reads: org.apache.spark.rdd.RDD[org.bdgenomics.formats.avro.AlignmentRecord] = MapPartitionsRDD[1] at map at ADAMContext.scala:167
kmers: org.apache.spark.rdd.RDD[(Long, String)] = ShuffledRDD[5] at sortByKey at <console>:27
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
(4,TCTTTCTTTCTTTCTTTCTTT)
(4,TTTCTTTCTTTCTTTCTTTCT)
(3,CTTTCTTTCTTTCTTTCTTTC)
(3,TTCTTTCTTTCTTTCTTTCTT)
(2,TCTTTTTCTTTCTTTCTTTCT)
(2,TTCTTTTTCTTTCTTTCTTTC)
(2,TTTCTTTTTCTTTCTTTCTTT)
(1,ATTGGATATCCTCCCAAATTT)
(1,AGGCATGAGGCACCGCGCCTG)
(1,CTACTGCCCAACAAGTCCCTA)




参考

【1】 https://github.com/bigdatagenomics/adam