spark-2.0.1版本的最新改动

Sub-task(子任务)

Bug

  • [​​SPARK-10683​​] - Source code missing for SparkR test JAR
  • [​​SPARK-11227​​] - Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1
  • [​​SPARK-12666​​] - spark-shell --packages cannot load artifacts which are publishLocal'd by SBT
  • [​​SPARK-14204​​] - [SQL] Failure to register URL-derived JDBC driver on executors in cluster mode
  • [​​SPARK-14209​​] - Application failure during preemption.
  • [​​SPARK-14818​​] - Move sketch and mllibLocal out from mima exclusion
  • [​​SPARK-15083​​] - History Server would OOM due to unlimited TaskUIData in some stages
  • [​​SPARK-15285​​] - Generated SpecificSafeProjection.apply method grows beyond 64 KB
  • [​​SPARK-15382​​] - monotonicallyIncreasingId doesn't work when data is upsampled
  • [​​SPARK-15390​​] - Memory management issue in complex DataFrame join and filter
  • [​​SPARK-15541​​] - SparkContext.stop throws error
  • [​​SPARK-15869​​] - HTTP 500 and NPE on streaming batch details page
  • [​​SPARK-15899​​] - file scheme should be used correctly
  • [​​SPARK-15989​​] - PySpark SQL python-only UDTs don't support nested types
  • [​​SPARK-16062​​] - PySpark SQL python-only UDTs don't work well
  • [​​SPARK-16321​​] - [Spark 2.0] Performance regression when reading parquet and using PPD and non-vectorized reader
  • [​​SPARK-16334​​] - SQL query on parquet table java.lang.ArrayIndexOutOfBoundsException
  • [​​SPARK-16409​​] - regexp_extract with optional groups causes NPE
  • [​​SPARK-16439​​] - Incorrect information in SQL Query details
  • [​​SPARK-16440​​] - Undeleted broadcast variables in Word2Vec causing OoM for long runs
  • [​​SPARK-16457​​] - Wrong messages when CTAS with a Partition By clause
  • [​​SPARK-16460​​] - Spark 2.0 CSV ignores NULL value in Date format
  • [​​SPARK-16462​​] - Spark 2.0 CSV does not cast null values to certain data types properly
  • [​​SPARK-16522​​] - [MESOS] Spark application throws exception on exit
  • [​​SPARK-16533​​] - Spark application not handling preemption messages
  • [​​SPARK-16550​​] - Caching data with replication doesn't replicate data
  • [​​SPARK-16558​​] - examples/mllib/LDAExample should use MLVector instead of MLlib Vector
  • [​​SPARK-16563​​] - Repeat calling Spark SQL thrift server fetchResults return empty for ExecuteStatement operation
  • [​​SPARK-16586​​] - spark-class crash with "[: too many arguments" instead of displaying the correct error message
  • [​​SPARK-16597​​] - DataFrame DateType is written as an int(Days since epoch) by csv writer
  • [​​SPARK-16610​​] - When writing ORC files, orc.compress should not be overridden if users do not set "compression" in the options
  • [​​SPARK-16613​​] - RDD.pipe returns values for empty partitions
  • [​​SPARK-16632​​] - Vectorized parquet reader fails to read certain fields from Hive tables
  • [​​SPARK-16633​​] - lag/lead using constant input values does not return the default value when the offset row does not exist
  • [​​SPARK-16634​​] - GenericArrayData can't be loaded in certain JVMs
  • [​​SPARK-16639​​] - query fails if having condition contains grouping column
  • [​​SPARK-16642​​] - ResolveWindowFrame should not be triggered on UnresolvedFunctions.
  • [​​SPARK-16644​​] - constraints propagation may fail the query
  • [​​SPARK-16646​​] - LEAST doesn't accept numeric arguments with different data types
  • [​​SPARK-16648​​] - LAST_VALUE(FALSE) OVER () throws IndexOutOfBoundsException
  • [​​SPARK-16656​​] - CreateTableAsSelectSuite is flaky
  • [​​SPARK-16664​​] - Spark 1.6.2 - Persist call on Data frames with more than 200 columns is wiping out the data.
  • [​​SPARK-16672​​] - SQLBuilder should not raise exceptions on EXISTS queries
  • [​​SPARK-16686​​] - Dataset.sample with seed: result seems to depend on downstream usage
  • [​​SPARK-16698​​] - json parsing regression - "." in keys
  • [​​SPARK-16699​​] - Fix performance bug in hash aggregate on long string keys
  • [​​SPARK-16700​​] - StructType doesn't accept Python dicts anymore
  • [​​SPARK-16703​​] - Extra space in WindowSpecDefinition SQL representation
  • [​​SPARK-16711​​] - YarnShuffleService doesn't re-init properly on YARN rolling upgrade
  • [​​SPARK-16714​​] - Fail to create a decimal arrays with literals having different inferred precessions and scales
  • [​​SPARK-16715​​] - Fix a potential ExprId conflict for SubexpressionEliminationSuite."Semantic equals and hash"
  • [​​SPARK-16721​​] - Lead/lag needs to respect nulls
  • [​​SPARK-16724​​] - Expose DefinedByConstructorParams
  • [​​SPARK-16729​​] - Spark should throw analysis exception for invalid casts to date type
  • [​​SPARK-16730​​] - Spark 2.0 breaks various Hive cast functions
  • [​​SPARK-16740​​] - joins.LongToUnsafeRowMap crashes with NegativeArraySizeException
  • [​​SPARK-16748​​] - Errors thrown by UDFs cause TreeNodeException when the query has an ORDER BY clause
  • [​​SPARK-16750​​] - ML GaussianMixture training failed due to feature column type mistake
  • [​​SPARK-16751​​] - Upgrade derby to 10.12.1.1 from 10.11.1.1
  • [​​SPARK-16770​​] - Spark shell not usable with german keyboard due to JLine version
  • [​​SPARK-16781​​] - java launched by PySpark as gateway may not be the same java used in the spark environment
  • [​​SPARK-16785​​] - dapply doesn't return array or raw columns
  • [​​SPARK-16787​​] - SparkContext.addFile() should not fail if called twice with the same file
  • [​​SPARK-16791​​] - casting structs fails on Timestamp fields (interpreted mode only)
  • [​​SPARK-16802​​] - joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException
  • [​​SPARK-16818​​] - Exchange reuse incorrectly reuses scans over different sets of partitions
  • [​​SPARK-16831​​] - CrossValidator reports incorrect avgMetrics
  • [​​SPARK-16836​​] - Hive date/time function error
  • [​​SPARK-16837​​] - TimeWindow incorrectly drops slideDuration in constructors
  • [​​SPARK-16850​​] - Improve error message for greatest/least
  • [​​SPARK-16873​​] - force spill NPE
  • [​​SPARK-16880​​] - Improve ANN training, add training data persist if needed
  • [​​SPARK-16883​​] - SQL decimal type is not properly cast to number when collecting SparkDataFrame
  • [​​SPARK-16901​​] - Hive settings in hive-site.xml may be overridden by Hive's default values
  • [​​SPARK-16905​​] - Support SQL DDL: MSCK REPAIR TABLE
  • [​​SPARK-16907​​] - Parquet table reading performance regression when vectorized record reader is not used
  • [​​SPARK-16922​​] - Query with Broadcast Hash join fails due to executor OOM in Spark 2.0
  • [​​SPARK-16925​​] - Spark tasks which cause JVM to exit with a zero exit code may cause app to hang in Standalone mode
  • [​​SPARK-16926​​] - Partition columns are present in columns metadata for partition but not table
  • [​​SPARK-16936​​] - Case Sensitivity Support for Refresh Temp Table
  • [​​SPARK-16942​​] - CREATE TABLE LIKE generates External table when source table is an External Hive Serde table
  • [​​SPARK-16943​​] - CREATE TABLE LIKE generates a non-empty table when source is a data source table
  • [​​SPARK-16950​​] - fromOffsets parameter in Kafka's Direct Streams does not work in python3
  • [​​SPARK-16953​​] - Make requestTotalExecutors public to be consistent with requestExecutors/killExecutors
  • [​​SPARK-16955​​] - Using ordinals in ORDER BY causes an analysis error when the query has a GROUP BY clause using ordinals
  • [​​SPARK-16959​​] - Table Comment in the CatalogTable returned from HiveMetastore is Always Empty
  • [​​SPARK-16961​​] - Utils.randomizeInPlace does not shuffle arrays uniformly
  • [​​SPARK-16966​​] - App Name is a randomUUID even when "spark.app.name" exists
  • [​​SPARK-16975​​] - Spark-2.0.0 unable to infer schema for parquet data written by Spark-1.6.2
  • [​​SPARK-16991​​] - Full outer join followed by inner join produces wrong results
  • [​​SPARK-16994​​] - Filter and limit are illegally permuted.
  • [​​SPARK-16995​​] - TreeNodeException when flat mapping RelationalGroupedDataset created from DataFrame containing a column created with lit/expr
  • [​​SPARK-17010​​] - [MINOR]Wrong description in memory management document
  • [​​SPARK-17013​​] - negative numeric literal parsing
  • [​​SPARK-17016​​] - group-by/order-by ordinal should throw AnalysisException instead of UnresolvedException
  • [​​SPARK-17022​​] - Potential deadlock in driver handling message
  • [​​SPARK-17027​​] - PolynomialExpansion.choose is prone to integer overflow
  • [​​SPARK-17038​​] - StreamingSource reports metrics for lastCompletedBatch instead of lastReceivedBatch
  • [​​SPARK-17051​​] - we should use hadoopConf in InsertIntoHiveTable
  • [​​SPARK-17056​​] - Fix a wrong assert in MemoryStore
  • [​​SPARK-17061​​] - Incorrect results returned following a join of two datasets and a map step where total number of columns >100
  • [​​SPARK-17065​​] - Improve the error message when encountering an incompatible DataSourceRegister
  • [​​SPARK-17066​​] - dateFormat should be used when writing dataframes as csv files
  • [​​SPARK-17086​​] - QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data
  • [​​SPARK-17093​​] - Roundtrip encoding of array<struct<>> fields is wrong when whole-stage codegen is disabled
  • [​​SPARK-17098​​] - "SELECT COUNT(NULL) OVER ()" throws UnsupportedOperationException during analysis
  • [​​SPARK-17099​​] - Incorrect result when HAVING clause is added to group by query
  • [​​SPARK-17100​​] - pyspark filter on a udf column after join gives java.lang.UnsupportedOperationException
  • [​​SPARK-17104​​] - LogicalRelation.newInstance should follow the semantics of MultiInstanceRelation
  • [​​SPARK-17110​​] - Pyspark with locality ANY throw java.io.StreamCorruptedException
  • [​​SPARK-17113​​] - Job failure due to Executor OOM in offheap mode
  • [​​SPARK-17114​​] - Adding a 'GROUP BY 1' where first column is literal results in wrong answer
  • [​​SPARK-17115​​] - Improve the performance of UnsafeProjection for wide table
  • [​​SPARK-17117​​] - 'SELECT 1 / NULL` throws AnalysisException, while 'SELECT 1 * NULL` works
  • [​​SPARK-17120​​] - Analyzer incorrectly optimizes plan to empty LocalRelation
  • [​​SPARK-17124​​] - RelationalGroupedDataset.agg should be order preserving and allow duplicate column names
  • [​​SPARK-17158​​] - Improve error message for numeric literal parsing
  • [​​SPARK-17160​​] - GetExternalRowField does not properly escape field names, causing generated code not to compile
  • [​​SPARK-17162​​] - Range does not support SQL generation
  • [​​SPARK-17167​​] - Issue Exceptions when Analyze Table on In-Memory Cataloged Tables
  • [​​SPARK-17180​​] - Unable to Alter the Temporary View Using ALTER VIEW command
  • [​​SPARK-17182​​] - CollectList and CollectSet should be marked as non-deterministic
  • [​​SPARK-17194​​] - When emitting SQL for string literals Spark should use single quotes, not double
  • [​​SPARK-17205​​] - Literal.sql does not properly convert NaN and Infinity literals
  • [​​SPARK-17210​​] - sparkr.zip is not distributed to executors when run sparkr in RStudio
  • [​​SPARK-17211​​] - Broadcast join produces incorrect results when compressed Oops differs between driver, executor
  • [​​SPARK-17216​​] - Even timeline for a stage doesn't core 100% of the bar timeline bar in chrome
  • [​​SPARK-17228​​] - Not infer/propagate non-deterministic constraints
  • [​​SPARK-17230​​] - Writing decimal to csv will result empty string if the decimal exceeds (20, 18)
  • [​​SPARK-17243​​] - Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history
  • [​​SPARK-17244​​] - Joins should not pushdown non-deterministic conditions
  • [​​SPARK-17252​​] - Performing arithmetic in VALUES can lead to ClassCastException / MatchErrors during query parsing
  • [​​SPARK-17253​​] - Left join where ON clause does not reference the right table produces analysis error
  • [​​SPARK-17261​​] - Using HiveContext after re-creating SparkContext in Spark 2.0 throws "Java.lang.illegalStateException: Cannot call methods on a stopped sparkContext"
  • [​​SPARK-17264​​] - DataStreamWriter should document that it only supports Parquet for now
  • [​​SPARK-17296​​] - Spark SQL: cross join + two joins = BUG
  • [​​SPARK-17299​​] - TRIM/LTRIM/RTRIM strips characters other than spaces
  • [​​SPARK-17306​​] - QuantileSummaries doesn't compress
  • [​​SPARK-17309​​] - ALTER VIEW should throw exception if view not exist
  • [​​SPARK-17323​​] - ALTER VIEW AS should keep the previous table properties, comment, create_time, etc.
  • [​​SPARK-17335​​] - Creating Hive table from Spark data
  • [​​SPARK-17336​​] - Repeated calls sbin/spark-config.sh file Causes ${PYTHONPATH} Value duplicate
  • [​​SPARK-17339​​] - Fix SparkR tests on Windows
  • [​​SPARK-17342​​] - Style of event timeline is broken
  • [​​SPARK-17352​​] - Executor computing time can be negative-number because of calculation error
  • [​​SPARK-17353​​] - CREATE TABLE LIKE statements when Source is a VIEW
  • [​​SPARK-17354​​] - java.lang.ClassCastException: java.lang.Integer cannot be cast to java.sql.Date
  • [​​SPARK-17355​​] - Work around exception thrown by HiveResultSetMetaData.isSigned
  • [​​SPARK-17356​​] - A large Metadata filed in Alias can cause OOM when calling TreeNode.toJSON
  • [​​SPARK-17358​​] - Cached table(parquet/orc) should be shard between beelines
  • [​​SPARK-17364​​] - Can not query hive table starting with number
  • [​​SPARK-17369​​] - MetastoreRelation toJSON throws exception
  • [​​SPARK-17370​​] - Shuffle service files not invalidated when a slave is lost
  • [​​SPARK-17376​​] - Spark version should be available in R
  • [​​SPARK-17391​​] - Fix Two Test Failures After Backport
  • [​​SPARK-17396​​] - Threads number keep increasing when query on external CSV partitioned table
  • [​​SPARK-17418​​] - Spark release must NOT distribute Kinesis related assembly artifact
  • [​​SPARK-17438​​] - Master UI should show the correct core limit when `ApplicationInfo.executorLimit` is set
  • [​​SPARK-17439​​] - QuantilesSummaries returns the wrong result after compression
  • [​​SPARK-17442​​] - Additional arguments in write.df are not passed to data source
  • [​​SPARK-17463​​] - Serialization of accumulators in heartbeats is not thread-safe
  • [​​SPARK-17465​​] - Inappropriate memory management in `org.apache.spark.storage.MemoryStore` may lead to memory leak
  • [​​SPARK-17474​​] - Python UDF does not work between Sort and Limit
  • [​​SPARK-17491​​] - MemoryStore.putIteratorAsBytes() may silently lose values when KryoSerializer is used
  • [​​SPARK-17494​​] - Floor/ceil of decimal returns wrong result if it's in compact format
  • [​​SPARK-17502​​] - Multiple Bugs in DDL Statements on Temporary Views
  • [​​SPARK-17503​​] - Memory leak in Memory store when unable to cache the whole RDD in memory
  • [​​SPARK-17511​​] - Dynamic allocation race condition: Containers getting marked failed while releasing
  • [​​SPARK-17512​​] - Specifying remote files for Python based Spark jobs in Yarn cluster mode not working
  • [​​SPARK-17514​​] - df.take(1) and df.limit(1).collect() perform differently in Python
  • [​​SPARK-17515​​] - CollectLimit.execute() should perform per-partition limits
  • [​​SPARK-17521​​] - Error when I use sparkContext.makeRDD(Seq())
  • [​​SPARK-17525​​] - SparkContext.clearFiles() still present in the PySpark bindings though the underlying Scala method was removed in Spark 2.0
  • [​​SPARK-17531​​] - Don't initialize Hive Listeners for the Execution Client
  • [​​SPARK-17541​​] - fix some DDL bugs about table management when same-name temp view exists
  • [​​SPARK-17545​​] - Spark SQL Catalyst doesn't handle ISO 8601 date without colon in offset
  • [​​SPARK-17546​​] - start-* scripts should use hostname -f
  • [​​SPARK-17547​​] - Temporary shuffle data files may be leaked following exception in write
  • [​​SPARK-17548​​] - Word2VecModel.findSynonyms can spuriously reject the best match when invoked with a vector
  • [​​SPARK-17567​​] - Broken link to Spark paper
  • [​​SPARK-17571​​] - AssertOnQuery.condition should be consistent in requiring Boolean return type
  • [​​SPARK-17599​​] - Folder deletion after globbing may fail StructuredStreaming jobs
  • [​​SPARK-17613​​] - PartitioningAwareFileCatalog.allFiles doesn't handle URI specified path at parent
  • [​​SPARK-17616​​] - Getting "java.lang.RuntimeException: Distinct columns cannot exist in Aggregate "
  • [​​SPARK-17617​​] - Remainder(%) expression.eval returns incorrect result
  • [​​SPARK-17618​​] - Dataframe except returns incorrect results when combined with coalesce
  • [​​SPARK-17627​​] - Streaming Providers should be labeled Experimental
  • [​​SPARK-17641​​] - collect_set should ignore null values
  • [​​SPARK-17644​​] - The failed stage never resubmitted due to abort stage in another thread
  • [​​SPARK-17650​​] - Adding a malformed URL to sc.addJar and/or sc.addFile bricks Executors
  • [​​SPARK-17652​​] - Fix confusing exception message while reserving capacity
  • [​​SPARK-17666​​] - take() or isEmpty() on dataset leaks s3a connections
  • [​​SPARK-17672​​] - Spark 2.0 history server web Ui takes too long for a single application
  • [​​SPARK-17673​​] - Reused Exchange Aggregations Produce Incorrect Results
  • [​​SPARK-17752​​] - Spark returns incorrect result when 'collect()'ing a cached Dataset with many columns
  • [​​SPARK-17809​​] - scala.MatchError: BooleanType when casting a struct

Documentation(文档)

  • [​​SPARK-16295​​] - Extract SQL programming guide example snippets from source files instead of hard code them
  • [​​SPARK-16761​​] - Fix doc link in docs/ml-guide.md
  • [​​SPARK-16911​​] - Remove migrating to a Spark 1.x version in programming guide documentation
  • [​​SPARK-17085​​] - Documentation and actual code differs - Unsupported Operations
  • [​​SPARK-17089​​] - Remove link of api doc for mapReduceTriplets because its removed from api.
  • [​​SPARK-17242​​] - Update links of external dstream projects
  • [​​SPARK-17561​​] - DataFrameWriter documentation formatting problems
  • [​​SPARK-17575​​] - Make correction in configuration documentation table tags

Improvement(改动)

  • [​​SPARK-2424​​] - ApplicationState.MAX_NUM_RETRY should be configurable
  • [​​SPARK-10835​​] - Word2Vec should accept non-null string array, in addition to existing null string array
  • [​​SPARK-12370​​] - Documentation should link to examples from its own release version
  • [​​SPARK-13286​​] - JDBC driver doesn't report full exception
  • [​​SPARK-15639​​] - Try to push down filter at RowGroups level for parquet reader
  • [​​SPARK-15703​​] - Make ListenerBus event queue size configurable
  • [​​SPARK-15923​​] - Spark Application rest api returns "no such app: <appId>"
  • [​​SPARK-16216​​] - CSV data source does not write date and timestamp correctly
  • [​​SPARK-16240​​] - model loading backward compatibility for ml.clustering.LDA
  • [​​SPARK-16320​​] - Document G1 heap region's effect on spark 2.0 vs 1.6
  • [​​SPARK-16324​​] - regexp_extract should doc that it returns empty string when match fails
  • [​​SPARK-16568​​] - update sql programing guide refreshTable API
  • [​​SPARK-16650​​] - Improve documentation of spark.task.maxFailures
  • [​​SPARK-16651​​] - Document no exception using DataFrame.withColumnRenamed when existing column doesn't exist
  • [​​SPARK-16663​​] - desc table should be consistent between data source and hive serde tables
  • [​​SPARK-16764​​] - Recommend disabling vectorized parquet reader on OutOfMemoryError
  • [​​SPARK-16772​​] - Correct API doc references to PySpark classes + formatting fixes
  • [​​SPARK-16796​​] - Visible passwords on Spark environment page
  • [​​SPARK-16805​​] - Log timezone when query result does not match
  • [​​SPARK-16812​​] - Open up SparkILoop.getAddedJars
  • [​​SPARK-16813​​] - Remove private[sql] and private[spark] from catalyst package
  • [​​SPARK-16870​​] - add "spark.sql.broadcastTimeout" into docs/sql-programming-guide.md to help people to how to fix this timeout error when it happenned
  • [​​SPARK-16875​​] - Add args checking for DataSet randomSplit and sample
  • [​​SPARK-16877​​] - Add a rule for preventing use Java's Override annotation
  • [​​SPARK-16932​​] - Programming-guide Accumulator section should be more clear w.r.t new API
  • [​​SPARK-16935​​] - Verification of Function-related ExternalCatalog APIs
  • [​​SPARK-16947​​] - Support type coercion and foldable expression for inline tables
  • [​​SPARK-16964​​] - Remove private[sql] and private[spark] from sql.execution package
  • [​​SPARK-17023​​] - Update Kafka connetor to use Kafka 0.10.0.1
  • [​​SPARK-17063​​] - MSCK REPAIR TABLE is super slow with Hive metastore
  • [​​SPARK-17084​​] - Rename ParserUtils.assert to validate
  • [​​SPARK-17186​​] - remove catalog table type INDEX
  • [​​SPARK-17193​​] - HadoopRDD NPE at DEBUG log level when getLocationInfo == null
  • [​​SPARK-17231​​] - Avoid building debug or trace log messages unless the respective log level is enabled
  • [​​SPARK-17246​​] - Support BigDecimal literal parsing
  • [​​SPARK-17279​​] - better error message for exceptions during ScalaUDF execution
  • [​​SPARK-17297​​] - Clarify window/slide duration as absolute time, not relative to a calendar
  • [​​SPARK-17301​​] - Remove unused classTag field from AtomicType base class
  • [​​SPARK-17316​​] - Don't block StandaloneSchedulerBackend.executorRemoved
  • [​​SPARK-17347​​] - Encoder in Dataset example has incorrect type
  • [​​SPARK-17378​​] - Upgrade snappy-java to 1.1.2.6
  • [​​SPARK-17421​​] - Document warnings about "MaxPermSize" parameter when building with Maven and Java 8
  • [​​SPARK-17445​​] - Reference an ASF page as the main place to find third-party packages
  • [​​SPARK-17480​​] - CompressibleColumnBuilder inefficiently call gatherCompressibilityStats
  • [​​SPARK-17483​​] - Minor refactoring and cleanup in BlockManager block status reporting and block removal
  • [​​SPARK-17484​​] - Race condition when cancelling a job during a cache write can lead to block fetch failures
  • [​​SPARK-17485​​] - Failed remote cached block reads can lead to whole job failure
  • [​​SPARK-17486​​] - Remove unused TaskMetricsUIData.updatedBlockStatuses field
  • [​​SPARK-17558​​] - Bump Hadoop 2.7 version from 2.7.2 to 2.7.3
  • [​​SPARK-17569​​] - Don't recheck existence of files when generating File Relation resolution in StructuredStreaming
  • [​​SPARK-17577​​] - SparkR support add files to Spark job and get by executors
  • [​​SPARK-17609​​] - SessionCatalog.tableExists should not check temp view
  • [​​SPARK-17638​​] - Stop JVM StreamingContext when the Python process is dead
  • [​​SPARK-17640​​] - Avoid using -1 as the default batchId for FileStreamSource.FileEntry
  • [​​SPARK-17649​​] - Log how many Spark events got dropped in LiveListenerBus
  • [​​SPARK-17651​​] - Automate Spark version update for documentations
  • [​​SPARK-18391​​] - Openstack deployment scenarios

New Feature(新特征)

Question

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.1</version>
</dependency>

 ​​https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.11/​