[Spark版本更新]--Spark-2.0.2

转载

high2011 2022-11-03 15:03:46 博主文章分类：Spark

文章标签 spark sql kafka 文章分类 云平台云计算

Sub-task

[SPARK-16963] - Change Source API so that sources do not need to keep unbounded state
[SPARK-17346] - Kafka 0.10 support in Structured Streaming
[SPARK-17731] - Metrics for Structured Streaming
[SPARK-17790] - Support for parallelizing R data.frame larger than 2GB
[SPARK-17812] - More granular control of starting offsets (assign)
[SPARK-17813] - Maximum data per trigger
[SPARK-17834] - Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer
[SPARK-17926] - Add methods to convert StreamingQueryStatus to json
[SPARK-18143] - History Server is broken because of the refactoring work in Structured Streaming
[SPARK-18154] - CLONE - Change Source API so that sources do not need to keep unbounded state
[SPARK-18164] - ForeachSink should fail the Spark job if `process` throws exception

Bug

[SPARK-13747] - Concurrent execution in SQL doesn't work with Scala ForkJoinPool
[SPARK-16304] - LinkageError should not crash Spark executor
[SPARK-16804] - Correlated subqueries containing non-deterministic operators return incorrect results
[SPARK-16988] - spark history server log needs to be fixed to show https url when ssl is enabled
[SPARK-17112] - "select if(true, null, null)" via JDBC triggers IllegalArgumentException in Thriftserver
[SPARK-17123] - Performing set operations that combine string and date / timestamp columns may result in generated projection code which doesn't compile
[SPARK-17153] - [Structured streams] readStream ignores partition columns
[SPARK-17337] - Incomplete algorithm for name resolution in Catalyst paser may lead to incorrect result
[SPARK-17417] - Fix sorting of part files while reconstructing RDD/partition from checkpointed files.
[SPARK-17549] - InMemoryRelation doesn't scale to large tables
[SPARK-17559] - PeriodicGraphCheckpointer did not persist edges as expected in some cases
[SPARK-17587] - SparseVector __getitem__ should follow __getitem__ contract
[SPARK-17612] - Support `DESCRIBE table PARTITION` SQL syntax
[SPARK-17643] - Remove comparable requirement from Offset
[SPARK-17697] - BinaryLogisticRegressionSummary, GLM Summary should handle non-Double numeric types
[SPARK-17698] - Join predicates should not contain filter clauses
[SPARK-17707] - Web UI prevents spark-submit application to be finished
[SPARK-17712] - Incorrect result due to invalid pushdown of data-independent filter beneath aggregate
[SPARK-17721] - Erroneous computation in multiplication of transposed SparseMatrix with SparseVector
[SPARK-17733] - InferFiltersFromConstraints rule never terminates for query
[SPARK-17750] - Cannot create view which includes interval arithmetic
[SPARK-17753] - Simple case in spark sql throws ParseException
[SPARK-17758] - Spark Aggregate function LAST returns null on an empty partition
[SPARK-17782] - Kafka 010 test is flaky
[SPARK-17792] - L-BFGS solver for linear regression does not accept general numeric label column types
[SPARK-17798] - Remove redundant Experimental annotations in sql.streaming package
[SPARK-17805] - sqlContext.read.text() does not work with a list of paths
[SPARK-17806] - Incorrect result when work with data from parquet
[SPARK-17808] - BinaryType fails in Python 3 due to outdated Pyrolite
[SPARK-17810] - Default spark.sql.warehouse.dir is relative to local FS but can resolve as HDFS path
[SPARK-17811] - SparkR cannot parallelize data.frame with NA or NULL in Date columns
[SPARK-17816] - Json serialzation of accumulators are failing with ConcurrentModificationException
[SPARK-17818] - Cannot SELECT NULL
[SPARK-17819] - Specified database in JDBC URL is ignored when connecting to thriftserver
[SPARK-17832] - TableIdentifier.quotedString creates un-parseable names when name contains a backtick
[SPARK-17841] - Kafka 0.10 commitQueue needs to be drained
[SPARK-17853] - Kafka OffsetOutOfRangeException on DStreams union from separate Kafka clusters with identical topic names.
[SPARK-17859] - persist should not impede with spark's ability to perform a broadcast join.
[SPARK-17863] - SELECT distinct does not work if there is a order by clause
[SPARK-17876] - Write StructuredStreaming WAL to a stream instead of materializing all at once
[SPARK-17880] - The url linking to `AccumulatorV2` in the document is incorrect.
[SPARK-17882] - RBackendHandler swallowing errors
[SPARK-17884] - In the cast expression, casting from empty string to interval type throws NullPointerException
[SPARK-17892] - Query in CTAS is Optimized Twice (branch-2.0)
[SPARK-17929] - Deadlock when AM restart and send RemoveExecutor on reset
[SPARK-17986] - SQLTransformer leaks temporary tables
[SPARK-17989] - Check ascendingOrder type in sort_array function ahead
[SPARK-18001] - Broke link to R DataFrame In sql-programming-guide
[SPARK-18003] - RDD zipWithIndex generate wrong result when one partition contains more than 2147483647 records.
[SPARK-18009] - Spark 2.0.1 SQL Thrift Error
[SPARK-18022] - java.lang.NullPointerException instead of real exception when saving DF to MySQL
[SPARK-18030] - Flaky test: org.apache.spark.sql.streaming.FileStreamSourceSuite
[SPARK-18034] - Upgrade to MiMa 0.1.11
[SPARK-18058] - AnalysisException may be thrown when union two DFs whose struct fields have different nullability
[SPARK-18063] - Failed to infer constraints over multiple aliases
[SPARK-18070] - binary operator should not consider nullability when comparing input types
[SPARK-18093] - Fix default value test in SQLConfSuite to work regardless of warehouse dir's existence
[SPARK-18114] - MesosClusterScheduler generate bad command options
[SPARK-18132] - spark 2.0 branch's spark-release-publish failed because style check failed.
[SPARK-18148] - Misleading Error Message for Aggregation Without Window/GroupBy
[SPARK-18189] - task not serializable with groupByKey() + mapGroups() + map
[SPARK-18342] - HDFSBackedStateStore can fail to rename files causing snapshotting and recovery to fail
[SPARK-18358] - Multiple Aggregation Using 'countDistinct' and 'first' result in error

Dependency upgrade

[SPARK-17803] - Docker integration tests don't run with "Docker for Mac"

Documentation

[SPARK-17736] - Update R README for rmarkdown, pandoc
[SPARK-17883] - Possible typo in comments of Row.scala
[SPARK-17953] - Fix typo in SparkSession scaladoc
[SPARK-18104] - Don't build KafkaSource doc

Improvement

[SPARK-16343] - Improve the PushDownPredicate rule to pushdown predicates currectly in non-deterministic condition
[SPARK-17751] - Remove spark.sql.eagerAnalysis
[SPARK-17780] - Report NoClassDefFoundError in StreamExecution
[SPARK-17999] - Add getPreferredLocations for KafkaSourceRDD
[SPARK-18044] - FileStreamSource should not infer partitions in every batch

New Feature

[SPARK-17711] - Compress rolled executor logs

Test

[SPARK-17624] - Flaky test? StateStoreSuite maintenance
[SPARK-17738] - Flaky test: org.apache.spark.sql.execution.columnar.ColumnTypeSuite MAP append/extract