[Spark版本更新]--Spark-2.0.1版本的最新改动

转载

high2011 2022-11-03 14:58:31 博主文章分类：Spark

文章标签 spark sql hive 文章分类 虚拟化云计算

spark-2.0.1版本的最新改动

Sub-task(子任务)

[SPARK-15232] - Add subquery SQL building tests to LogicalPlanToSQLSuite
[SPARK-15698] - Ability to remove old metadata for structure streaming MetadataLog
[SPARK-15814] - Aggregator can return null result
[SPARK-16287] - Implement str_to_map SQL function
[SPARK-16312] - Docs for Kafka 0.10 consumer integration
[SPARK-16380] - Update SQL examples and programming guide for Python language binding
[SPARK-16391] - KeyValueGroupedDataset.reduceGroups should support partial aggregation
[SPARK-16508] - Fix documentation warnings found by R CMD check
[SPARK-16510] - Move SparkR test JAR into Spark, include its source code
[SPARK-16519] - Handle SparkR RDD generics that create warnings in R CMD check
[SPARK-16577] - Add check-cran script to Jenkins
[SPARK-16579] - Add a spark install function
[SPARK-16581] - Making JVM backend calling functions public
[SPARK-16621] - Generate stable SQLs in SQLBuilder
[SPARK-16734] - Make sure examples in all language bindings are consistent
[SPARK-16735] - Fail to create a map contains decimal type with literals having different inferred precessions and scales
[SPARK-16774] - Fix use of deprecated TimeStamp constructor (also providing incorrect results)
[SPARK-16776] - Fix Kafka deprecation warnings
[SPARK-16778] - Fix use of deprecated SQLContext constructor
[SPARK-16800] - Fix Java Examples that throw exception
[SPARK-16866] - Basic infrastructure for file-based SQL end-to-end tests
[SPARK-17007] - Move test data files into a test-data folder
[SPARK-17008] - Normalize query results using sorting
[SPARK-17009] - Use a new SparkSession for each test case
[SPARK-17011] - Support testing exceptions in queries
[SPARK-17015] - group-by-ordinal and order-by-ordinal test cases
[SPARK-17018] - literals.sql for testing literal parsing
[SPARK-17042] - Repl-defined classes cannot be replicated
[SPARK-17096] - Fix StreamingQueryListener to return message and stacktrace of actual exception
[SPARK-17149] - array.sql for testing array related functions
[SPARK-17165] - FileStreamSource should not track the list of seen files indefinitely
[SPARK-17235] - MetadataLog should support purging old logs
[SPARK-17269] - Move finish analysis stage into its own file
[SPARK-17270] - Move object optimization rules into its own file
[SPARK-17274] - Move join optimizer rules into a separate file
[SPARK-17372] - Running a file stream on a directory with partitioned subdirs throw NotSerializableException/StackOverflowError
[SPARK-17513] - StreamExecution should discard unneeded metadata
[SPARK-17586] - Use Static member not via instance reference
[SPARK-18151] - CLONE - MetadataLog should support purging old logs
[SPARK-18152] - CLONE - FileStreamSource should not track the list of seen files indefinitely
[SPARK-18153] - CLONE - Ability to remove old metadata for structure streaming MetadataLog
[SPARK-18156] - CLONE - StreamExecution should discard unneeded metadata

Bug

[SPARK-10683] - Source code missing for SparkR test JAR
[SPARK-11227] - Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1
[SPARK-12666] - spark-shell --packages cannot load artifacts which are publishLocal'd by SBT
[SPARK-14204] - [SQL] Failure to register URL-derived JDBC driver on executors in cluster mode
[SPARK-14209] - Application failure during preemption.
[SPARK-14818] - Move sketch and mllibLocal out from mima exclusion
[SPARK-15083] - History Server would OOM due to unlimited TaskUIData in some stages
[SPARK-15285] - Generated SpecificSafeProjection.apply method grows beyond 64 KB
[SPARK-15382] - monotonicallyIncreasingId doesn't work when data is upsampled
[SPARK-15390] - Memory management issue in complex DataFrame join and filter
[SPARK-15541] - SparkContext.stop throws error
[SPARK-15869] - HTTP 500 and NPE on streaming batch details page
[SPARK-15899] - file scheme should be used correctly
[SPARK-15989] - PySpark SQL python-only UDTs don't support nested types
[SPARK-16062] - PySpark SQL python-only UDTs don't work well
[SPARK-16321] - [Spark 2.0] Performance regression when reading parquet and using PPD and non-vectorized reader
[SPARK-16334] - SQL query on parquet table java.lang.ArrayIndexOutOfBoundsException
[SPARK-16409] - regexp_extract with optional groups causes NPE
[SPARK-16439] - Incorrect information in SQL Query details
[SPARK-16440] - Undeleted broadcast variables in Word2Vec causing OoM for long runs
[SPARK-16457] - Wrong messages when CTAS with a Partition By clause
[SPARK-16460] - Spark 2.0 CSV ignores NULL value in Date format
[SPARK-16462] - Spark 2.0 CSV does not cast null values to certain data types properly
[SPARK-16522] - [MESOS] Spark application throws exception on exit
[SPARK-16533] - Spark application not handling preemption messages
[SPARK-16550] - Caching data with replication doesn't replicate data
[SPARK-16558] - examples/mllib/LDAExample should use MLVector instead of MLlib Vector
[SPARK-16563] - Repeat calling Spark SQL thrift server fetchResults return empty for ExecuteStatement operation
[SPARK-16586] - spark-class crash with "[: too many arguments" instead of displaying the correct error message
[SPARK-16597] - DataFrame DateType is written as an int(Days since epoch) by csv writer
[SPARK-16610] - When writing ORC files, orc.compress should not be overridden if users do not set "compression" in the options
[SPARK-16613] - RDD.pipe returns values for empty partitions
[SPARK-16632] - Vectorized parquet reader fails to read certain fields from Hive tables
[SPARK-16633] - lag/lead using constant input values does not return the default value when the offset row does not exist
[SPARK-16634] - GenericArrayData can't be loaded in certain JVMs
[SPARK-16639] - query fails if having condition contains grouping column
[SPARK-16642] - ResolveWindowFrame should not be triggered on UnresolvedFunctions.
[SPARK-16644] - constraints propagation may fail the query
[SPARK-16646] - LEAST doesn't accept numeric arguments with different data types
[SPARK-16648] - LAST_VALUE(FALSE) OVER () throws IndexOutOfBoundsException
[SPARK-16656] - CreateTableAsSelectSuite is flaky
[SPARK-16664] - Spark 1.6.2 - Persist call on Data frames with more than 200 columns is wiping out the data.
[SPARK-16672] - SQLBuilder should not raise exceptions on EXISTS queries
[SPARK-16686] - Dataset.sample with seed: result seems to depend on downstream usage
[SPARK-16698] - json parsing regression - "." in keys
[SPARK-16699] - Fix performance bug in hash aggregate on long string keys
[SPARK-16700] - StructType doesn't accept Python dicts anymore
[SPARK-16703] - Extra space in WindowSpecDefinition SQL representation
[SPARK-16711] - YarnShuffleService doesn't re-init properly on YARN rolling upgrade
[SPARK-16714] - Fail to create a decimal arrays with literals having different inferred precessions and scales
[SPARK-16715] - Fix a potential ExprId conflict for SubexpressionEliminationSuite."Semantic equals and hash"
[SPARK-16721] - Lead/lag needs to respect nulls
[SPARK-16724] - Expose DefinedByConstructorParams
[SPARK-16729] - Spark should throw analysis exception for invalid casts to date type
[SPARK-16730] - Spark 2.0 breaks various Hive cast functions
[SPARK-16740] - joins.LongToUnsafeRowMap crashes with NegativeArraySizeException
[SPARK-16748] - Errors thrown by UDFs cause TreeNodeException when the query has an ORDER BY clause
[SPARK-16750] - ML GaussianMixture training failed due to feature column type mistake
[SPARK-16751] - Upgrade derby to 10.12.1.1 from 10.11.1.1
[SPARK-16770] - Spark shell not usable with german keyboard due to JLine version
[SPARK-16781] - java launched by PySpark as gateway may not be the same java used in the spark environment
[SPARK-16785] - dapply doesn't return array or raw columns
[SPARK-16787] - SparkContext.addFile() should not fail if called twice with the same file
[SPARK-16791] - casting structs fails on Timestamp fields (interpreted mode only)
[SPARK-16802] - joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException
[SPARK-16818] - Exchange reuse incorrectly reuses scans over different sets of partitions
[SPARK-16831] - CrossValidator reports incorrect avgMetrics
[SPARK-16836] - Hive date/time function error
[SPARK-16837] - TimeWindow incorrectly drops slideDuration in constructors
[SPARK-16850] - Improve error message for greatest/least
[SPARK-16873] - force spill NPE
[SPARK-16880] - Improve ANN training, add training data persist if needed
[SPARK-16883] - SQL decimal type is not properly cast to number when collecting SparkDataFrame
[SPARK-16901] - Hive settings in hive-site.xml may be overridden by Hive's default values
[SPARK-16905] - Support SQL DDL: MSCK REPAIR TABLE
[SPARK-16907] - Parquet table reading performance regression when vectorized record reader is not used
[SPARK-16922] - Query with Broadcast Hash join fails due to executor OOM in Spark 2.0
[SPARK-16925] - Spark tasks which cause JVM to exit with a zero exit code may cause app to hang in Standalone mode
[SPARK-16926] - Partition columns are present in columns metadata for partition but not table
[SPARK-16936] - Case Sensitivity Support for Refresh Temp Table
[SPARK-16942] - CREATE TABLE LIKE generates External table when source table is an External Hive Serde table
[SPARK-16943] - CREATE TABLE LIKE generates a non-empty table when source is a data source table
[SPARK-16950] - fromOffsets parameter in Kafka's Direct Streams does not work in python3
[SPARK-16953] - Make requestTotalExecutors public to be consistent with requestExecutors/killExecutors
[SPARK-16955] - Using ordinals in ORDER BY causes an analysis error when the query has a GROUP BY clause using ordinals
[SPARK-16959] - Table Comment in the CatalogTable returned from HiveMetastore is Always Empty
[SPARK-16961] - Utils.randomizeInPlace does not shuffle arrays uniformly
[SPARK-16966] - App Name is a randomUUID even when "spark.app.name" exists
[SPARK-16975] - Spark-2.0.0 unable to infer schema for parquet data written by Spark-1.6.2
[SPARK-16991] - Full outer join followed by inner join produces wrong results
[SPARK-16994] - Filter and limit are illegally permuted.
[SPARK-16995] - TreeNodeException when flat mapping RelationalGroupedDataset created from DataFrame containing a column created with lit/expr
[SPARK-17010] - [MINOR]Wrong description in memory management document
[SPARK-17013] - negative numeric literal parsing
[SPARK-17016] - group-by/order-by ordinal should throw AnalysisException instead of UnresolvedException
[SPARK-17022] - Potential deadlock in driver handling message
[SPARK-17027] - PolynomialExpansion.choose is prone to integer overflow
[SPARK-17038] - StreamingSource reports metrics for lastCompletedBatch instead of lastReceivedBatch
[SPARK-17051] - we should use hadoopConf in InsertIntoHiveTable
[SPARK-17056] - Fix a wrong assert in MemoryStore
[SPARK-17061] - Incorrect results returned following a join of two datasets and a map step where total number of columns >100
[SPARK-17065] - Improve the error message when encountering an incompatible DataSourceRegister
[SPARK-17066] - dateFormat should be used when writing dataframes as csv files
[SPARK-17086] - QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data
[SPARK-17093] - Roundtrip encoding of array<struct<>> fields is wrong when whole-stage codegen is disabled
[SPARK-17098] - "SELECT COUNT(NULL) OVER ()" throws UnsupportedOperationException during analysis
[SPARK-17099] - Incorrect result when HAVING clause is added to group by query
[SPARK-17100] - pyspark filter on a udf column after join gives java.lang.UnsupportedOperationException
[SPARK-17104] - LogicalRelation.newInstance should follow the semantics of MultiInstanceRelation
[SPARK-17110] - Pyspark with locality ANY throw java.io.StreamCorruptedException
[SPARK-17113] - Job failure due to Executor OOM in offheap mode
[SPARK-17114] - Adding a 'GROUP BY 1' where first column is literal results in wrong answer
[SPARK-17115] - Improve the performance of UnsafeProjection for wide table
[SPARK-17117] - 'SELECT 1 / NULL` throws AnalysisException, while 'SELECT 1 * NULL` works
[SPARK-17120] - Analyzer incorrectly optimizes plan to empty LocalRelation
[SPARK-17124] - RelationalGroupedDataset.agg should be order preserving and allow duplicate column names
[SPARK-17158] - Improve error message for numeric literal parsing
[SPARK-17160] - GetExternalRowField does not properly escape field names, causing generated code not to compile
[SPARK-17162] - Range does not support SQL generation
[SPARK-17167] - Issue Exceptions when Analyze Table on In-Memory Cataloged Tables
[SPARK-17180] - Unable to Alter the Temporary View Using ALTER VIEW command
[SPARK-17182] - CollectList and CollectSet should be marked as non-deterministic
[SPARK-17194] - When emitting SQL for string literals Spark should use single quotes, not double
[SPARK-17205] - Literal.sql does not properly convert NaN and Infinity literals
[SPARK-17210] - sparkr.zip is not distributed to executors when run sparkr in RStudio
[SPARK-17211] - Broadcast join produces incorrect results when compressed Oops differs between driver, executor
[SPARK-17216] - Even timeline for a stage doesn't core 100% of the bar timeline bar in chrome
[SPARK-17228] - Not infer/propagate non-deterministic constraints
[SPARK-17230] - Writing decimal to csv will result empty string if the decimal exceeds (20, 18)
[SPARK-17243] - Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history
[SPARK-17244] - Joins should not pushdown non-deterministic conditions
[SPARK-17252] - Performing arithmetic in VALUES can lead to ClassCastException / MatchErrors during query parsing
[SPARK-17253] - Left join where ON clause does not reference the right table produces analysis error
[SPARK-17261] - Using HiveContext after re-creating SparkContext in Spark 2.0 throws "Java.lang.illegalStateException: Cannot call methods on a stopped sparkContext"
[SPARK-17264] - DataStreamWriter should document that it only supports Parquet for now
[SPARK-17296] - Spark SQL: cross join + two joins = BUG
[SPARK-17299] - TRIM/LTRIM/RTRIM strips characters other than spaces
[SPARK-17306] - QuantileSummaries doesn't compress
[SPARK-17309] - ALTER VIEW should throw exception if view not exist
[SPARK-17323] - ALTER VIEW AS should keep the previous table properties, comment, create_time, etc.
[SPARK-17335] - Creating Hive table from Spark data
[SPARK-17336] - Repeated calls sbin/spark-config.sh file Causes ${PYTHONPATH} Value duplicate
[SPARK-17339] - Fix SparkR tests on Windows
[SPARK-17342] - Style of event timeline is broken
[SPARK-17352] - Executor computing time can be negative-number because of calculation error
[SPARK-17353] - CREATE TABLE LIKE statements when Source is a VIEW
[SPARK-17354] - java.lang.ClassCastException: java.lang.Integer cannot be cast to java.sql.Date
[SPARK-17355] - Work around exception thrown by HiveResultSetMetaData.isSigned
[SPARK-17356] - A large Metadata filed in Alias can cause OOM when calling TreeNode.toJSON
[SPARK-17358] - Cached table(parquet/orc) should be shard between beelines
[SPARK-17364] - Can not query hive table starting with number
[SPARK-17369] - MetastoreRelation toJSON throws exception
[SPARK-17370] - Shuffle service files not invalidated when a slave is lost
[SPARK-17376] - Spark version should be available in R
[SPARK-17391] - Fix Two Test Failures After Backport
[SPARK-17396] - Threads number keep increasing when query on external CSV partitioned table
[SPARK-17418] - Spark release must NOT distribute Kinesis related assembly artifact
[SPARK-17438] - Master UI should show the correct core limit when `ApplicationInfo.executorLimit` is set
[SPARK-17439] - QuantilesSummaries returns the wrong result after compression
[SPARK-17442] - Additional arguments in write.df are not passed to data source
[SPARK-17463] - Serialization of accumulators in heartbeats is not thread-safe
[SPARK-17465] - Inappropriate memory management in `org.apache.spark.storage.MemoryStore` may lead to memory leak
[SPARK-17474] - Python UDF does not work between Sort and Limit
[SPARK-17491] - MemoryStore.putIteratorAsBytes() may silently lose values when KryoSerializer is used
[SPARK-17494] - Floor/ceil of decimal returns wrong result if it's in compact format
[SPARK-17502] - Multiple Bugs in DDL Statements on Temporary Views
[SPARK-17503] - Memory leak in Memory store when unable to cache the whole RDD in memory
[SPARK-17511] - Dynamic allocation race condition: Containers getting marked failed while releasing
[SPARK-17512] - Specifying remote files for Python based Spark jobs in Yarn cluster mode not working
[SPARK-17514] - df.take(1) and df.limit(1).collect() perform differently in Python
[SPARK-17515] - CollectLimit.execute() should perform per-partition limits
[SPARK-17521] - Error when I use sparkContext.makeRDD(Seq())
[SPARK-17525] - SparkContext.clearFiles() still present in the PySpark bindings though the underlying Scala method was removed in Spark 2.0
[SPARK-17531] - Don't initialize Hive Listeners for the Execution Client
[SPARK-17541] - fix some DDL bugs about table management when same-name temp view exists
[SPARK-17545] - Spark SQL Catalyst doesn't handle ISO 8601 date without colon in offset
[SPARK-17546] - start-* scripts should use hostname -f
[SPARK-17547] - Temporary shuffle data files may be leaked following exception in write
[SPARK-17548] - Word2VecModel.findSynonyms can spuriously reject the best match when invoked with a vector
[SPARK-17567] - Broken link to Spark paper
[SPARK-17571] - AssertOnQuery.condition should be consistent in requiring Boolean return type
[SPARK-17599] - Folder deletion after globbing may fail StructuredStreaming jobs
[SPARK-17613] - PartitioningAwareFileCatalog.allFiles doesn't handle URI specified path at parent
[SPARK-17616] - Getting "java.lang.RuntimeException: Distinct columns cannot exist in Aggregate "
[SPARK-17617] - Remainder(%) expression.eval returns incorrect result
[SPARK-17618] - Dataframe except returns incorrect results when combined with coalesce
[SPARK-17627] - Streaming Providers should be labeled Experimental
[SPARK-17641] - collect_set should ignore null values
[SPARK-17644] - The failed stage never resubmitted due to abort stage in another thread
[SPARK-17650] - Adding a malformed URL to sc.addJar and/or sc.addFile bricks Executors
[SPARK-17652] - Fix confusing exception message while reserving capacity
[SPARK-17666] - take() or isEmpty() on dataset leaks s3a connections
[SPARK-17672] - Spark 2.0 history server web Ui takes too long for a single application
[SPARK-17673] - Reused Exchange Aggregations Produce Incorrect Results
[SPARK-17752] - Spark returns incorrect result when 'collect()'ing a cached Dataset with many columns
[SPARK-17809] - scala.MatchError: BooleanType when casting a struct

Documentation(文档)

[SPARK-16295] - Extract SQL programming guide example snippets from source files instead of hard code them
[SPARK-16761] - Fix doc link in docs/ml-guide.md
[SPARK-16911] - Remove migrating to a Spark 1.x version in programming guide documentation
[SPARK-17085] - Documentation and actual code differs - Unsupported Operations
[SPARK-17089] - Remove link of api doc for mapReduceTriplets because its removed from api.
[SPARK-17242] - Update links of external dstream projects
[SPARK-17561] - DataFrameWriter documentation formatting problems
[SPARK-17575] - Make correction in configuration documentation table tags

Improvement(改动)

[SPARK-2424] - ApplicationState.MAX_NUM_RETRY should be configurable
[SPARK-10835] - Word2Vec should accept non-null string array, in addition to existing null string array
[SPARK-12370] - Documentation should link to examples from its own release version
[SPARK-13286] - JDBC driver doesn't report full exception
[SPARK-15639] - Try to push down filter at RowGroups level for parquet reader
[SPARK-15703] - Make ListenerBus event queue size configurable
[SPARK-15923] - Spark Application rest api returns "no such app: <appId>"
[SPARK-16216] - CSV data source does not write date and timestamp correctly
[SPARK-16240] - model loading backward compatibility for ml.clustering.LDA
[SPARK-16320] - Document G1 heap region's effect on spark 2.0 vs 1.6
[SPARK-16324] - regexp_extract should doc that it returns empty string when match fails
[SPARK-16568] - update sql programing guide refreshTable API
[SPARK-16650] - Improve documentation of spark.task.maxFailures
[SPARK-16651] - Document no exception using DataFrame.withColumnRenamed when existing column doesn't exist
[SPARK-16663] - desc table should be consistent between data source and hive serde tables
[SPARK-16764] - Recommend disabling vectorized parquet reader on OutOfMemoryError
[SPARK-16772] - Correct API doc references to PySpark classes + formatting fixes
[SPARK-16796] - Visible passwords on Spark environment page
[SPARK-16805] - Log timezone when query result does not match
[SPARK-16812] - Open up SparkILoop.getAddedJars
[SPARK-16813] - Remove private[sql] and private[spark] from catalyst package
[SPARK-16870] - add "spark.sql.broadcastTimeout" into docs/sql-programming-guide.md to help people to how to fix this timeout error when it happenned
[SPARK-16875] - Add args checking for DataSet randomSplit and sample
[SPARK-16877] - Add a rule for preventing use Java's Override annotation
[SPARK-16932] - Programming-guide Accumulator section should be more clear w.r.t new API
[SPARK-16935] - Verification of Function-related ExternalCatalog APIs
[SPARK-16947] - Support type coercion and foldable expression for inline tables
[SPARK-16964] - Remove private[sql] and private[spark] from sql.execution package
[SPARK-17023] - Update Kafka connetor to use Kafka 0.10.0.1
[SPARK-17063] - MSCK REPAIR TABLE is super slow with Hive metastore
[SPARK-17084] - Rename ParserUtils.assert to validate
[SPARK-17186] - remove catalog table type INDEX
[SPARK-17193] - HadoopRDD NPE at DEBUG log level when getLocationInfo == null
[SPARK-17231] - Avoid building debug or trace log messages unless the respective log level is enabled
[SPARK-17246] - Support BigDecimal literal parsing
[SPARK-17279] - better error message for exceptions during ScalaUDF execution
[SPARK-17297] - Clarify window/slide duration as absolute time, not relative to a calendar
[SPARK-17301] - Remove unused classTag field from AtomicType base class
[SPARK-17316] - Don't block StandaloneSchedulerBackend.executorRemoved
[SPARK-17347] - Encoder in Dataset example has incorrect type
[SPARK-17378] - Upgrade snappy-java to 1.1.2.6
[SPARK-17421] - Document warnings about "MaxPermSize" parameter when building with Maven and Java 8
[SPARK-17445] - Reference an ASF page as the main place to find third-party packages
[SPARK-17480] - CompressibleColumnBuilder inefficiently call gatherCompressibilityStats
[SPARK-17483] - Minor refactoring and cleanup in BlockManager block status reporting and block removal
[SPARK-17484] - Race condition when cancelling a job during a cache write can lead to block fetch failures
[SPARK-17485] - Failed remote cached block reads can lead to whole job failure
[SPARK-17486] - Remove unused TaskMetricsUIData.updatedBlockStatuses field
[SPARK-17558] - Bump Hadoop 2.7 version from 2.7.2 to 2.7.3
[SPARK-17569] - Don't recheck existence of files when generating File Relation resolution in StructuredStreaming
[SPARK-17577] - SparkR support add files to Spark job and get by executors
[SPARK-17609] - SessionCatalog.tableExists should not check temp view
[SPARK-17638] - Stop JVM StreamingContext when the Python process is dead
[SPARK-17640] - Avoid using -1 as the default batchId for FileStreamSource.FileEntry
[SPARK-17649] - Log how many Spark events got dropped in LiveListenerBus
[SPARK-17651] - Automate Spark version update for documentations
[SPARK-18391] - Openstack deployment scenarios

New Feature(新特征)

[SPARK-16956] - Make ApplicationState.MAX_NUM_RETRY configurable
[SPARK-17069] - Expose spark.range() as table-valued function in SQL
[SPARK-17150] - Support SQL generation for inline tables
[SPARK-17456] - Utility for parsing Spark versions

Question

[SPARK-17794] - 2.0.1 not in maven central repo?
maven引入方式：

<dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>2.0.1</version>
    </dependency>

https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.11/