Sub-task

  • [​​SPARK-16963​​] - Change Source API so that sources do not need to keep unbounded state
  • [​​SPARK-17346​​] - Kafka 0.10 support in Structured Streaming
  • [​​SPARK-17731​​] - Metrics for Structured Streaming
  • [​​SPARK-17790​​] - Support for parallelizing R data.frame larger than 2GB
  • [​​SPARK-17812​​] - More granular control of starting offsets (assign)
  • [​​SPARK-17813​​] - Maximum data per trigger
  • [​​SPARK-17834​​] - Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer
  • [​​SPARK-17926​​] - Add methods to convert StreamingQueryStatus to json
  • [​​SPARK-18143​​] - History Server is broken because of the refactoring work in Structured Streaming
  • [​​SPARK-18154​​] - CLONE - Change Source API so that sources do not need to keep unbounded state
  • [​​SPARK-18164​​] - ForeachSink should fail the Spark job if `process` throws exception

Bug

  • [​​SPARK-13747​​] - Concurrent execution in SQL doesn't work with Scala ForkJoinPool
  • [​​SPARK-16304​​] - LinkageError should not crash Spark executor
  • [​​SPARK-16804​​] - Correlated subqueries containing non-deterministic operators return incorrect results
  • [​​SPARK-16988​​] - spark history server log needs to be fixed to show https url when ssl is enabled
  • [​​SPARK-17112​​] - "select if(true, null, null)" via JDBC triggers IllegalArgumentException in Thriftserver
  • [​​SPARK-17123​​] - Performing set operations that combine string and date / timestamp columns may result in generated projection code which doesn't compile
  • [​​SPARK-17153​​] - [Structured streams] readStream ignores partition columns
  • [​​SPARK-17337​​] - Incomplete algorithm for name resolution in Catalyst paser may lead to incorrect result
  • [​​SPARK-17417​​] - Fix sorting of part files while reconstructing RDD/partition from checkpointed files.
  • [​​SPARK-17549​​] - InMemoryRelation doesn't scale to large tables
  • [​​SPARK-17559​​] - PeriodicGraphCheckpointer did not persist edges as expected in some cases
  • [​​SPARK-17587​​] - SparseVector __getitem__ should follow __getitem__ contract
  • [​​SPARK-17612​​] - Support `DESCRIBE table PARTITION` SQL syntax
  • [​​SPARK-17643​​] - Remove comparable requirement from Offset
  • [​​SPARK-17697​​] - BinaryLogisticRegressionSummary, GLM Summary should handle non-Double numeric types
  • [​​SPARK-17698​​] - Join predicates should not contain filter clauses
  • [​​SPARK-17707​​] - Web UI prevents spark-submit application to be finished
  • [​​SPARK-17712​​] - Incorrect result due to invalid pushdown of data-independent filter beneath aggregate
  • [​​SPARK-17721​​] - Erroneous computation in multiplication of transposed SparseMatrix with SparseVector
  • [​​SPARK-17733​​] - InferFiltersFromConstraints rule never terminates for query
  • [​​SPARK-17750​​] - Cannot create view which includes interval arithmetic
  • [​​SPARK-17753​​] - Simple case in spark sql throws ParseException
  • [​​SPARK-17758​​] - Spark Aggregate function LAST returns null on an empty partition
  • [​​SPARK-17782​​] - Kafka 010 test is flaky
  • [​​SPARK-17792​​] - L-BFGS solver for linear regression does not accept general numeric label column types
  • [​​SPARK-17798​​] - Remove redundant Experimental annotations in sql.streaming package
  • [​​SPARK-17805​​] - sqlContext.read.text() does not work with a list of paths
  • [​​SPARK-17806​​] - Incorrect result when work with data from parquet
  • [​​SPARK-17808​​] - BinaryType fails in Python 3 due to outdated Pyrolite
  • [​​SPARK-17810​​] - Default spark.sql.warehouse.dir is relative to local FS but can resolve as HDFS path
  • [​​SPARK-17811​​] - SparkR cannot parallelize data.frame with NA or NULL in Date columns
  • [​​SPARK-17816​​] - Json serialzation of accumulators are failing with ConcurrentModificationException
  • [​​SPARK-17818​​] - Cannot SELECT NULL
  • [​​SPARK-17819​​] - Specified database in JDBC URL is ignored when connecting to thriftserver
  • [​​SPARK-17832​​] - TableIdentifier.quotedString creates un-parseable names when name contains a backtick
  • [​​SPARK-17841​​] - Kafka 0.10 commitQueue needs to be drained
  • [​​SPARK-17853​​] - Kafka OffsetOutOfRangeException on DStreams union from separate Kafka clusters with identical topic names.
  • [​​SPARK-17859​​] - persist should not impede with spark's ability to perform a broadcast join.
  • [​​SPARK-17863​​] - SELECT distinct does not work if there is a order by clause
  • [​​SPARK-17876​​] - Write StructuredStreaming WAL to a stream instead of materializing all at once
  • [​​SPARK-17880​​] - The url linking to `AccumulatorV2` in the document is incorrect.
  • [​​SPARK-17882​​] - RBackendHandler swallowing errors
  • [​​SPARK-17884​​] - In the cast expression, casting from empty string to interval type throws NullPointerException
  • [​​SPARK-17892​​] - Query in CTAS is Optimized Twice (branch-2.0)
  • [​​SPARK-17929​​] - Deadlock when AM restart and send RemoveExecutor on reset
  • [​​SPARK-17986​​] - SQLTransformer leaks temporary tables
  • [​​SPARK-17989​​] - Check ascendingOrder type in sort_array function ahead
  • [​​SPARK-18001​​] - Broke link to R DataFrame In sql-programming-guide
  • [​​SPARK-18003​​] - RDD zipWithIndex generate wrong result when one partition contains more than 2147483647 records.
  • [​​SPARK-18009​​] - Spark 2.0.1 SQL Thrift Error
  • [​​SPARK-18022​​] - java.lang.NullPointerException instead of real exception when saving DF to MySQL
  • [​​SPARK-18030​​] - Flaky test: org.apache.spark.sql.streaming.FileStreamSourceSuite
  • [​​SPARK-18034​​] - Upgrade to MiMa 0.1.11
  • [​​SPARK-18058​​] - AnalysisException may be thrown when union two DFs whose struct fields have different nullability
  • [​​SPARK-18063​​] - Failed to infer constraints over multiple aliases
  • [​​SPARK-18070​​] - binary operator should not consider nullability when comparing input types
  • [​​SPARK-18093​​] - Fix default value test in SQLConfSuite to work regardless of warehouse dir's existence
  • [​​SPARK-18114​​] - MesosClusterScheduler generate bad command options
  • [​​SPARK-18132​​] - spark 2.0 branch's spark-release-publish failed because style check failed.
  • [​​SPARK-18148​​] - Misleading Error Message for Aggregation Without Window/GroupBy
  • [​​SPARK-18189​​] - task not serializable with groupByKey() + mapGroups() + map
  • [​​SPARK-18342​​] - HDFSBackedStateStore can fail to rename files causing snapshotting and recovery to fail
  • [​​SPARK-18358​​] - Multiple Aggregation Using 'countDistinct' and 'first' result in error

Dependency upgrade

Documentation

Improvement

New Feature

Test

  • [​​SPARK-17624​​] - Flaky test? StateStoreSuite maintenance
  • [​​SPARK-17738​​] - Flaky test: org.apache.spark.sql.execution.columnar.ColumnTypeSuite MAP append/extract

 

  • [​​SPARK-17778​​] - Mock SparkContext to reduce memory usage of BlockManagerSuite