​ http://mysqlserverteam.com/the-mysql-8-0-0-milestone-release-is-available/​

The MySQL Development team is happy to announce our 8.0.0 development milestone release (DMR), now available for download at ​​dev.mysql.com​​​.  The source code is available at ​​GitHub​​​. You can find the full list of changes and bug fixes in the ​​8.0.0 Release Notes​​.  Here are the highlights. Enjoy!

Transactional Data Dictionary

MySQL 8.0 will have a real Data Dictionary implemented as a set of SQL tables stored in InnoDB and we deliver most of it in this 8.0.0 DMR. The FRM, TRG, PAR files are gone. Information Schema has been re-implemented as views on data dictionary tables. DDL statements become atomic and crash safe and reliability is increased. MyISAM is now in principle optional as all system tables have been moved from MyISAM to InnoDB. This blog post will explain the new data-dictionary in some depth, but first we present the other features and changes delivered in this 8.0.0 DMR.

SQL roles

Roles (​​WL#988​​​) — This work by Kristofer Petterson implements standard SQL Roles in MySQL 8.0. A role is a named collection of privileges. One can grant roles, grant to roles, create roles, drop roles, and decide what roles are applicable during a session. We have also provided an SQL function ​​ROLES_GRAPHML()​

Character Sets

utf8mb4 collations (​​WL#9108​​​ and ​​WL#9125​​​)  — This work by Xing Z. Zhang is part of a bigger effort to improve our UTF8 support. The overall goal is to change the default character set from ​​latin1​​ to ​​utf8mb4​​, and the default collation from ​​latin1_swedish_ci​​ to​​utf8mb4_800_ci_ai​​.  UTF8 is the dominating character encoding for the web, and this move will make life easier for the vast majority of MySQL users.  In 8.0.0, we have added 21 language specific case insensitive ​​utf8mb4​​ collations (see ​​WL#9108​​​) in addition to​​utf8mb4_800_ci_ai​​ (see ​​WL#9125​​). If time permits, we will also add accent and case sensitive collation. The default character set has not been changed in 8.0.0, but we plan to do so before GA.

Upgrade Unicode data to 9.0.0   (​​WL#9479​​​) — This work by Xing Z. Zhang upgrades our collation weight and case mapping data and collations to build our new collations on latest Unicode data. Unicode committee announced ​​Unicode 9.0.0​​ on Jun 21, 2016.

Indexes

Support for INVISIBLE indexes  (​​WL#8697​​​)  — This work by Martin Hansson adds the capability of toggling the visibility of an index (visible/invisible). An invisible index is not considered by the optimizer when it makes the query execution plan. However, the index is still maintained in the background so it is cheap to make it visible again. The purpose of this is for a DBA / DevOp to determine if an index can be dropped. If you suspect an index of not being used you first make it invisible, then monitor query performance, and finally remove the index if no query slow down is experienced. This feature has been asked for by many users, for example Dropbox. See also the Feature Request logged as ​​Bug#70299​​.

Bit-wise operations on binary data types, IPV6 and UUID manipulation

Bit-wise operations on binary data types  (​​WL#8699​​​) — This work by Catalin Besleaga extends our existing bit-wise operations (‘bitwise AND’, etc), which work with ​​BIGINT​​, to also work with ​​[VAR]BINARY/[TINY|MEDIUM|LONG]BLOB​​. Prior to 8.0 bit-wise operations were only supported for integers (and not binaries). If you used bit-wise operations on binaries the arguments were implicitly cast to ​​BIGINT​​ (64 bit) before the operation, thus possibly losing bits. From 8.0.0 and onward bit-wise operations work for all ​​BINARY​​ and ​​BLOB​​​​INET6_ATON()​​ and ​​INET6_NTOA()​​ functions which convert IPv6 addresses between text form (like ​​'fe80::226:b9ff:fe77:eb17'​​) and ​​VARBINARY(16)​​. However, until now we could not combine these IPv6 functions with  bit-wise operations since such operations would – wrongly – convert output to ​​BIGINT​​. With ​​​WL#8699​​​ this has been fixed.  For example, if we have an IPv6 address and want to test it against a network mask, we can now use ​​INET6_ATON(address) & INET6_ATON(network)​​ because ​​INET6_ATON()​​ correctly returns the​​VARBINARY(16)​​We have improved usability of UUID manipulations (​​​WL#8920​​​) — This work by Catalin Besleaga improves the usability of UUID manipulations by implementing three new SQL functions: ​​UUID_TO_BIN()​​, ​​BIN_TO_UUID()​​, and ​​IS_UUID()​​. The first one converts from UUID formatted text to ​​VARBINARY(16)​​, the second one from ​​VARBINARY(16)​​ to UUID formatted text, and the last one checks the validity of an UUID formatted text. The UUID stored as a​​VARBINARY(16)​​ can be indexed using functional indexes. The functions ​​UUID_TO_BIN()​​ and​​UUID_TO_BIN()​​ can also shuffle the time-related bits and move them at the beginning making it index friendly and avoiding the random inserts in the B-tree, this way reducing the insert time. The lack of such functionality has been mentioned as one of the​​​drawbacks of using UUID’s​​​.​​VARBINARY(16)​​. MySQL provides functions to convert from textual IPv6/UUID representations to and from the more compact ​​VARBINARY(16)​​ datatype. MySQL now offers bit-wise operations on ​​VARBINARY(16)​

SET PERSIST for global variables

SET PERSIST for global variable values (​​WL#8688​​​) — This work by Satish Bharathy makes it possible to persist global, dynamic server variables. Currently a number of server variables are both GLOBAL and DYNAMIC and can be reconfigured while the server is running. For example: ​​SET GLOBAL sql_mode='STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION'; ​​However, such settings are lost upon a server restart.  This work makes it possible to write ​​SET PERSIST sql_mode='STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION'; ​​The effect is that the setting will survive a server restart. There are many usage scenarios for this functionality but most importantly it gives a way to manage server settings when editing the configuration files is inconvenient or not an option. For example in some hosted environments you don’t have file system access, all that you have is the ability to connect to one or more servers. As for​​SET GLOBAL​​ you need the super privilege for ​​SET PERSIST​​.​​variables_info​​ in the ​​performance_schema​​database. The ​​variables_info​

Performance Schema

Performance Schema instrumenting errors (​​WL#8058​​​) — This work by Mayank Prasad makes it possible to look at aggregated counts of ​​error messages​​​ reported by the server. The user can look at statistics from 5 different tables: Global count, summary per thread, summary per user, summary per host, or summary per account. For each error message the user can see the number of errors raised, the number of errors handled by the SQL exception handler, “first seen” timestamp, and “last seen” timestamp. Given the right privileges the user can either ​​SELECT​​ from these tables or ​​TRUNCATE​​Performance Schema Indexes (​​​WL#6616​​​) — This work by Chris Powers speeds up performance schema queries by adding more than 100 indexes on performance schema tables.  The indexes on performance schema tables are predefined. They cannot be deleted,added or altered. A performance schema index is implemented as a filtered scan across the existing table data, rather than a traversal through a separate data structure. There are no B-trees or hash tables to be constructed, updated or otherwise managed. Performance Schema tables indexes behave like hash indexes in that a) they quickly retrieve the desired rows, and b) do not provide row ordering, leaving the server to sort the result set if necessary. However, depending on the query, indexes obviate the need for a full table scan and will return a considerably smaller result set. Performance schema indexes are visible with ​​SHOW INDEXES​​ and are represented in the ​​EXPLAIN​​ output for queries that reference indexed columns. See also ​​​comment​​ from Simon Mudd @ Booking.com.

Performance Schema, Finer grained interfaces  (​​WL#8880​​) — This work by Marc Alff breaks the existing interface (psi.h) in smaller parts (per instrumentation). This is an internal refactoring task, no functional change.

Parser Refactoring

We are continuing our effort of refactoring the SQL parser in an incremental way. The old parser had critical limitations because of its grammar complexity and top-down parsing style which lead to poor maintainability and extensibility. In MySQL 5.7 we did all of the DML statements and we are now turning our attention towards the DDL statements.

In MySQL 8.0.0 we have been refactoring the CREATE TABLE statement by Gleb Shchepa finishing umbrella worklog ​​WL#8067​​​ and its children ​​WL#8433​​​ / ​​WL#8434​​​ / ​​WL#8435​​​.  The current main parser grammar is a mix of regular SQL statements and special Data Dictionary-related declarations such as ​​parse_gcol_expr​​ and ​​partition_entry​​ which are not valid SQL statements but data definitions from data dictionary. This work is about separating such grammar rules from regular SQL with the help of lexical scanner. ​​​WL#8433​​​ is about separating out DD commands from regular SQL queries in the parser grammar, ​​WL#8434​​​ is about refactoring partitioning-related definitions, and ​​WL#8435​​​ is about refactoring columns-related definitions.Allow parsing a single expression  (​​WL#7840​​​) —  This work by Gleb Shchepa makes it possible to parse expressions independently of their context, e.g. the ​​PARTITION BY​​Merge all SELECT rules into one (​​​WL#8907​​​) —  This work by Gleb Shchepa merges the 7  separate grammar rules for the ​​SELECT​​ in different contexts into one. This makes the ​​SELECT​​syntax uniform in all contexts. A benefit will be that adding a ​​WITH​​ clause to various forms of ​​SELECT​​ will be doable with one single code change in ​​sql_yacc.yy​​.Introduce <query expression> parser rule (​​WL#8083​​​)  —  This work by Martin Hansson introduces the SQL standard’s syntactic rule <query expression> and its sub-rules for production of query syntax. These replace our bison rules for representing select/union syntax. The syntax duplication for queries and subqueries has been removed, both now use <query expression>. The <table reference> syntax rule is now handled properly and the syntax ​​NATURAL INNER JOIN​

Optimizer Hints

​WHERE​​ contains a selective predicate which can be pushed down to the view’s inner tables.  On the other hand, derived tables containing dependent subqueries in the SELECT * FROM t1 LEFT JOIN (SELECT (subq) FROMt2) AS dt ON ...;. Such tables will always be read as many times as there are qualifying rows in the outer table, and the select list subqueries are evaluated for each row combination, so it is better to materialize the derived table.

Enabling merging a derived table or view through a optimizer hint   (​​WL#9307​​)  —  This work by Guilhem Bichot allows users to control whether a derived table or view will be merged or materialized using the “merge” and “no_merge” hints. We can simply add a hint to tell the Optimizer to merge instead of materialize the derived table in the example above: SELECT /*+ merge */ * FROM t1 LEFT JOIN (SELECT (subq) FROM t2) AS dt ON ...;.

Cost Model

InnoDB buffer estimates for tables and indexes (​​WL#7170​​) —  This work by Vasil Dimov provides estimates for how much of a table or an index is currently available in a main memory buffer. When the Optimizer chooses which access method to use, it can benefit if it knows whether data is stored in memory or has to be read from disk. This work implements the necessary support in InnoDB for providing these estimates to the handler. This work is part of a bigger effort to let the optimizer know whether data is in main memory or on disk and adjust its behavior accordingly.

Histograms

With Histograms, the User or DBA can create statistics on data distribution, typically for non-indexed columns, which then will be used by the query optimizer in finding the optimal query plan. As the first steps towards histogram support in 8.0 we have implemented the following two worklogs. The functionality that actually creates and uses histograms will be implemented in separate worklogs in upcoming DMRs.

Classes/structures for Histograms  (​​WL#8707​​)   —  This work by Erik Froseth implements classes and functions for histogram statistics. We implemented the necessary functions for building both singleton and equi-height

Persistent storage of Histogram data  (​​WL#8706​​​)   —  This work by Erik Froseth defines the persistent storage format for histogram data. This is needed so that histogram data does not have to be created each time the server starts. We store the histogram statistics in a new system table: ​​mysql.column_stats​​. The histogram itself will be stored in a JSON column due to the flexibility of the JSON datatype.

GIS


Spatial Reference Systems (​​WL#8579​​​) — This work by Norvald Ryeng adds support for spatial reference system (​​SRS​​​) definitions, both ellipsoids with latitude and longitude coordinates and flat map projections. This work is the first stepping stone on our road to full geography support. Until now, MySQL has only understood one coordinate system, a unitless 2d Cartesian plane that is not referenced to positions on Earth (​​SRID​​​ 0). This worklog adds support for georeferenced ellipsoids and 2d projections. This includes data dictionary storage and caching of SRS definitions, a parser for the ​​WKT​​​ description of SRSs, and a set of default SRSs from the ​​EPSG Geodetic Parameter Dataset​​. This work does not change the actual calculations. However, it provides a way to add SRS metadata to geometric objects. As of now, computations in geographic SRSs will still be done in SRID 0, but a warning will be issued.

Mutator ​​ST_SRID​​  (​​​WL#8543​​​) — This work by Hans Melby makes it possible to update the SRID of an existing geometry. The geometry’s coordinates are not transformed, but the SRID tells the server which spatial reference system the geometry is in. This is useful when geometry data is not in the desired spatial reference system, e.g., when upgrading from older MySQL versions that predate spatial reference systems support.Mutator ​​ST_X​​ and ​​ST_Y​​ (​​​WL#8606​​​) — This work by Jens Even B. Blomsøy makes it possible to update the X and Y coordinates of existing point geometries. It also makes the ​​ST_X​​ and​​ST_Y​​ functions consistently raise an error if the geometry is not a point. This improves both​​​OpenGIS​​ and SQL/MM compliance.

Improved Performance for Scans

Optimizer provides InnoDB with a bigger buffer (​​WL#7093​​​) — This work by Knut Hatlen improves MySQL/InnoDB range scan performance by giving InnoDB a buffer and a batch size to read. Even before this work, in order to reduce the cost of latching and B-tree navigation, InnoDB uses a small internal buffer to fetch records in batches for scans. However, the server layer knows how much data it needs and should decide the batch size. This is what this work delivers. In ​​SELECT​​ queries, a buffer will be provided for each table and indexscan that is believed to read more than one row. The bigger buffer will primarily improve the performance of queries that perform scans which read a large number of rows, for example SELECT * FROM t; or SELECT * FROM t WHERE pk BETWEEN 1000 AND10000;

Plugin Infrastructure


Service registry and component infrastructure ( ​​WL#4102​​) — This work by Marcin Babij implements a new Plugin infrastructure. The old infrastructure is still operational. We will gradually shift from the old to the new infrastructure. In the new infrastructure each component will provide a set of services that other components can consume. To facilitate this there is a registry of all services available to all components. Each component will communicate with other components only through services and will explicitly state the services it provides and consumes. The infrastructure will enable components to override and complement functionality provided by other components through re-implementing the relevant service APIs


 

BLOB


InnoDB: Refactor BLOB code to facilitate partial fetch/update  (​​WL#9141​​​, ​​WL#8985​​​) —  This work by Annamalai Gurusami is part of a bigger effort to implement partial fetch and update of BLOBs. So far we have refactored uncompressed BLOB code as described in​​WL#9141​​​ and compressed BLOB code as described in ​​WL#8985​​. We want to have an efficient BLOB implementation since more and more people use BLOBs and they use bigger and bigger BLOBs. A driving factor here is that JSON documents are stored as BLOBs. A main goal for 8.0 is to replicate partial BLOB updates,  to make the update of a single JSON attribute replicate efficiently.

InnoDB Memcached

InnoDB_Memcached: support multiple get and range search  (​​WL#6650​​) —  This work by Jimmy Yang enhances the InnoDB Memcached functionalities with multiple get operations and support for range queries. We added support for the multiple get operation to further improve the read  performance, i.e. the user can fetch multiple key value pairs in a single memcached query. Support for range queries

Persisting Fast Changing Meta-data

InnoDB persistent max value for autoinc columns (​​WL#6204​​​) —  This work by Bin Su persists the ​​AUTOINC​​ counters by writing them to the redo log. This is a fix for the very old ​​​Bug#199​​​. The MySQL recovery process will the replay redo log and ensure correct values of the​​AUTOINC​​ counters. There won’t be any rollback of ​​AUTOINC​​ counters.  This means that database recovery will reestablish the last known counter value after a crash. It comes with the guarantee that the ​​AUTOINC​​ counter cannot get the same value twice. The counter is monotonically increasing, but note that there can be gaps (unused values). The lack of persistent ​​AUTOINC​​ has been seen as troublesome in the past, e.g. see ​​​Bug#21641​​​ reported by Stephen Dewey in 2006 or ​​this​​ blog post .

InnoDB: Persist the “corrupted” flag in the data dictionary (​​WL#7816​​) —  This work by Bin Su makes the “corrupted” flag persistent by writing it to the redo log. For example, when InnoDB notices corruption in an index tree, it flags it as corrupted, both in its internal data dictionary cache and in persistent storage. This means that the server will get to know about corrupted indexes during recovery and thus avoid using such indexes for queries.

Temporary Tables

InnoDB: Map compressed temporary tables to uncompressed (​​WL#7899​​​) —  This work by Bin Su removes the InnoDB support for compressed temporary tables. Temporary tables are short-lived, ideally never written back to files, unless the buffer pool space is tight. It does not seem to make sense to try to reduce the file I/O by increasing the memory and CPU usage. All temporary tables will be created in the shared temporary tablespace “ibtmp1” introduced in ​​WL#6560​​​. Two columns will be dropped from the information schema table because they will be always FALSE: ​​INNODB_TEMP_TABLE_INFO.PER_TABLE_TABLESPACE​​ and​​INNODB_TEMP_TABLE_INFO.IS_COMPRESSED​​.

Store temporary table metadata in memory (​​WL#7784​​) —  This work by Dmitry Lenev ensures that metadata for temporary tables are stored in memory only (not persisted). Until now, temporary table metadata was stored in FRM files. The historical reasons for this became obsolete in 5.7 because InnoDB stores temporary tables in a dedicated tablespace, which is discarded on startup. Note that the MySQL temporary table implementation differs from the SQL Standard in the sense that MySQL temporary tables are not shown in the information schema.

Performance & Scalability

InnoDB: Group purging of rows by table ID (​​WL#9387​​​) —  This work by Sunny Bains groups the undo rows by table ID and then parcels the work to the purge threads based on the table id. The purpose is to reduce ​​dict_index_t::lock​​ contention. Since the introduction of multi-threaded purge the purge threads work on random undo log records. If different purge threads all purge from the same table this leads to higher contention on the​​dict_index_t::lock​​ (RW-LOCK). See also ​​​Bug#81368​​​ reported by Domas Mituzas @ Facebook.InnoDB: Remove the buffer pool mutex   (​​WL#8423​​​)  —  This work by  Shaohua Wang splits the buffer pool mutex into several mutexes (​​free_list​​, ​​LRU_list​​, ​​zip_free​​, and ​​zip_hash​​). This avoids or reduces scalability problems for some workloads where the old​​buffer_pool_mutex​​ became hot. Thank you to Percona who contributed the original patch by ​​​Bug#75534​​​ !Split LOCK_thd_list and LOCK_thd_remove mutexes   (​​WL#9250​​​) —  This work by Jon Olav Hauglid removes a current bottleneck for connect/disconnect performance by splitting up the two mutexes ​​LOCK_thd_list​​ and ​​LOCK_thd_remove​​. Both mutexes protect the global list of current connections (THDs). Performance testing of shows around 5% improvement in TPS for point selects with reconnect between each query.

Compilers and Libraries

Enable use of C++11 in MySQL server  (​​WL#8896​​​) —  This work by Jon Olav Hauglid enables use of C++11 features in 8.0. C++11 has a number of new features, especially related to concurrency, that are of interest. Compiler support includes ​​GCC support​​​ for C++11, ​​Clang support​​​ for C++11 , ​​Visual Studio support​​​ for C++11, ​​Solaris Studio support​​ for C++11.

InnoDB: Move from homebrew thread management to std::thread   (​​WL#9359​​) —  This work by Sunny Bains introduces the use of std::thread library which will replace our own homebrew thread libraries.  This is enabled by moving to C++11 in 8.0.  The C++11 standard supports threading.

Doxygen Documentation

Starting with MySQL release 8.0.0, the source code is now commented using the ​​Doxygen​​​documentation generation tool. Several community members have pointed out that our internals documentation has been outdated (like ​​here​​​). With this initiative we hope to keep the internal docs up-to-date by aligning it with the source code and utilizing the Doxygen formatting. By documenting our source code better we make it easier to understand and maintain, and lower the barrier of entry for code contributions. This is our first version, we will build it out over time. Check it out here:​​http://dev.mysql.com/doc/dev/mysql-server/8.0.0/​​.

Replication

Restore backup on existing GTID-aware server – make GTID_PURGED settable always(​​WL#6591​​​) —  This work by Andrei Elkin makes it possible to restore backups on an existing server without losing existing Global Transaction ID (GTID) information and binary logs. Until now, it has only been possible to add GTIDs to ​​GTID_PURGED​​ only when​​GTID_EXECUTED​​ is empty, i.e. when restoring a backup on an empty server. This work makes it possible to add GTIDs to ​​GTID_PURGED​​ at any time, even when ​​GTID_EXECUTED​​Enhanced Applier Thread Progress Details (​​​WL#7364​​​) —  This work by Luis Soares adds monitoring information in performance schema for how far the SQL thread has processed each ROW event. For this purpose Row Based Replication (RBR) stages have been added to performance_schema stages tables. The motivation is that the stats that are shown through ​​SHOW PROCESSLIST​​, ​​INFORMATION_SCHEMA.PROCESSLIST​​, ​​PERFORMANCE_SCHEMA.THREADS​​and ​​PERFORMANCE_SCHEMA.EVENTS_STAGES_*​​ have been misleading when the SQL thread applies row events.  In fact, currently wrong information can be reported through ​​SHOW PROCESSLIST​​.  Therefore, users should at least be able to see through performance schema tables what is the correct state of the SQL thread and what exactly it is doing at the moment.  This minimal information is of great help when troubleshooting RBR problems. We hope to extend this in the future to cover more stats for RBR. See also ​​​Bug#69360​​​ by Simon Mudd, ​​Bug#62019​​​ by Mark Callaghan and ​​Bug#53375​​ by Shane Bester.

Transactional Data Dictionary – Deep Dive

MySQL 8.0 comes with a transactional data dictionary. The goals of this project have been published earlier, e.g. in Ståle Deraas’ ​​blog post​​. Here our focus will be what it is

MySQL 5.7 does not have an explicitly defined data dictionary, an approximation can be to point to persistent .FRM files (file system) and the table definition cache as the in-memory representation of tables. This changes with MySQL 8.0. MySQL 8.0 has an explicit definition of what the data dictionary is and is not, namely an identified set of meta-data tables stored in transactional storage (InnoDB). ​​WL#6379​​ explains the data dictionary table structure. Moreover, each MySQL release will have a data dictionary version number which will be incremented whenever there is a change in the data-dictionary definition. The in-memory representation is in the form of objects kept in a dictionary cache. In 8.0.0 DMR, the data dictionary tables are stored in file-per-table tablespaces. Before 8.0 GA we will store dictionary tables in a single dedicated dictionary tablespace.

With the introduction of an explicit data dictionary it becomes more important to note the difference between *system tables* and the data-dictionary. Historically, MySQL has stored some of its meta-data as normal MyISAM tables in the *mysql database* (schema), in contrast to meta-data information stored in special files (.FRM, .PAR, .TRG, .TRN). Note that the concept of system tables still exists in MySQL 8.0, but these tables are now stored in InnoDB. Some of the MySQL 5.7 system tables have now been folded into the data dictionary (event and proc) while the rest are kept as system tables. The distinction between the data dictionary and the set of system tables is somewhat pragmatic, but one can say that data dictionary contains meta data needed to execute SQL queries while system tables contain auxiliary meta-data like timezone and help information.   In the future it might be natural to fold additional system tables into the data dictionary in order to have a common infrastructure for all meta-data.

​mysql_upgrade​

Information Schema

The above is roughly what has been delivered in the 8.0.0 DMR, the first step in the overall data dictionary project (runtime step). Here comes a preview of the second and last step that will complete the project (innodb step). There are three main parts:

First, we will eliminate the InnoDB internal data dictionary and use one common data dictionary and ensure crash safeness on all DDL statements. Meta-data for Foreign Keys are moved up from the SE layer to the common data dictionary paving the way for proper meta-data locking for Foreign Keys. This part will significantly increase MySQL reliability.

Second, we will write meta-data in a serialized form into user tablespaces on disk, providing meta-data redundancy and facilitating movement of data. We will also provide tools to check, extract and insert tablespace meta-data.  This part will match the benefits associated with the .FRM, the ability to edit meta-data for some disaster recovery scenarios.

Third, we will implement import/export of self contained tablespaces (data+meta-data).  This part will match the benefits associated with moving MyISAM tables around.

With this overview, here are the worklogs we delivered in 8.0.0.

Start & Upgrade

Bootstrap code for new data dictionary (​​WL#6394​​) —  This work by Sivert Sørumgård  implements server bootstrap in the context of the new data dictionary in 8.0. Bootstrapping is done when starting the mysqld server process.  There are four basic aspects of this: 1) Creating the data dictionary while starting a new server instance. 2) (Re)starting a server with existing data. 3) Upgrading from a previous 8.0 version. 4) Upgrading from 5.7 to 8.0, i.e., start using the new data dictionary on already existing user data. The general direction is for the server to be able to bootstrap itself and eliminate external scripts from the process.

Upgrade to Transactional Data Dictionary (​​WL#6392​​​) —  This work by Abhishek Rajan implements in-place upgrade from 5.7 to 8.0. The upgrade works as before, like from 5.6 and 5.7. The user starts the 8.0.0 server on a 5.7 image and then runs ​​mysql_upgrade​​.​​mysqldump​​. This will fix any changes in system tables between mysql-5.7 and mysql-8.0.  This will also fix performance schema, sys schema, and information schema databases.

Information Schema

New Data Dictionary and I_S integration  (​​WL#6599​​​)  —  This work by Gopal Shankar re-implements information schema so most data is read from data dictionary tables and thus avoids creation of   temporary tables and preparation of ​​TABLE_SHARE​

InnoDB: provide a way to do non-locking reads (​​WL#7464​​) —  This work by Satya Bodapati avoids information schema queries that get blocked by a parallel DDL operation. The new data dictionary allows information schema queries to get their data directly from data dictionary tables. Information Schema tables will be represented as views, thus a query from an information schema table will be a query from a view. Information schema queries can be executed under different isolation levels that are set by the user. Hence it is possible that  information schema queries would be blocked by a parallel DDL operation. This would be a compatibility problem wrt. 5.7. In order to avoid this problem we implemented a way to request non-locking read on a per handler instance.

Use non-locking reads for data dictionary tables under I_S view (​​WL#7907​​​) —  This work by  Gopal  Shankar implements non-locking reads on data dictionary tables when they are used under a “system view”, which matches the behavior in 5.7. This is the server layer counterpart to ​​WL#7464​​ explained above.

Change DDL to update rows for view columns in DD.COLUMNS and other dependent values(​​WL#7167​​​)  —  This work by Praveenkumar ensures that a view’s column information is stored into the data dictionary table ​​mysql.COLUMNS​​. Moreover, the implementation of DDL statements affecting underlying tables of a view is extended to update view columns information in ​​mysql.COLUMNS​​ and ​​mysql.TABLE.IS_UPDATABLE​​. This speeds up information schema queries because column information can be found by direct lookup.

Protect Data Dictionary tables (​​WL#6391​​) —  This work by Sivert Sørumgård restricts the availability of data dictionary information to the users. User access to meta data is supported by the Information Schema. We do this to maintain predictable data dictionary table definitions, maintain meta data correctness, and separate interface from implementation. Access to executing DDL and DML statements is still supported for statements originating from the server itself, e.g. during initial start and upgrade.

System Tables

 moving system tables from MyISAM to transactional storage (InnoDB). The new implementation assumes transactional semantics but can in principle be stored in any transactional storage engine.  The 5.7 system tables​​event ​​and ​​proc​​ have been folded into the data dictionary. Note: We have also added a few new system tables, but this is not related to the new data dictionary. New system tables are: ​​role_edges​​ and ​​default_roles​​ (​​​WL#988​​​), ​​components​​ (​​​WL#4102​​​),​​st_spatial_reference_systems​​ (​​​WL#8579​​​)  and ​​column_stats​​ (​​​WL#8706​​).

In 5.7 we started the work to move system tables into transactional storage, we moved replication related system tables, time zone tables (​​WL#7159​​​), plugins tables and servers tables (​​WL#7160​​​). Here in the 8.0.0 DMR we have finalized this work by moving the privilege system tables (​​WL#7158​​​ by Dmitry Shulga) and User-Defined Function (UDF) system tables (​​WL#8980​​ by Thayumanavar Sachithanantha). The main work here has been to change the surrounding code to correctly deal with transactional semantics when interacting with transactional storage.

Make user management DDLs atomic (​​WL#9045​​​) —  This work by Harin Vadodaria makes sure that DDLs related to user management behave in atomic fashion. ​​CREATE/ALTER/DROP USER​​ and ​​GRANT/REVOKE​

Seralized Dictionary Information (SDI)

Provide data dictionary information in serialized form   (​​WL#7069​​​) —  This work by Dyre Tjeldvoll implements replacement functionality for the old .FRM files in the context of the new data dictionary. The old .FRM files used to store the meta data in a serialized form, i.e. the ​​CREATE TABLE​

Data Dictionary Cache

Dictionary object cache (​​WL#8150​​) —  This work by Sivert Sørumgård implements the data dictionary cache. It provides a shared cache for data dictionary objects, it tracks usage of data dictionary objects, and it handles modification of data dictionary objects.  The dictionary object cache is designed to replace several old caches.

InnoDB

InnoDB: Provide storage for tablespace dictionary (​​WL#7053​​) —  This work by Satya Bodapati provides an API and a storage location for serialized dictionary information. The work is done in the context of the new data dictionary. In 8.0, in addition to storing meta-data in normal InnoDB tables in the global data dictionary tablespace, the meta-data is also stored in a serialized form together with the user tables containing the data. This makes the user tablespaces self descriptive and they can thus be moved to a different MySQL instance (import/export), tools can extract meta-data from user tablespaces, etc. This also provides for redundancy, with one copy in the Global data dictionary tablespace and one or two copies in tablespaces containing the data. The redundancy provides for a way out in certain disaster recovery scenarios where the dictionary information is lost or damaged.  Note that this is preparatory work, no effect in 8.0.0.

External tool to extract InnoDB tablespace dictionary information  (​​WL#7066​​) —  This work by Satya Bodapati implements a tool for extracting Serialized Dictionary Information

Data Dictionary APIs

Develop Google test framework and guidelines for data dictionary API (​​WL#7770​​​) —  This work by Sivert Sørumgård develops a ​​Google Test​​ framework to write unit tests for data dictionary API work. It also provides guidelines for how to write such tests.

Make sure errors are properly handled in data dictionary API (​​WL#7771​​) —  This work by  Gopal Shankar harmonizes error handling across all data dictionary APs and ensures that error reporting is consistently implemented and complete across these APIs.

Formulate framework for API for data dictionary (​​WL#6380​​) —  This work by  Alexander Nozdrin defines basic principles and guidelines for implementing the new data dictionary. In other words: how

Implement common code for different data dictionary APIs (​​WL#7284​​​) —  This work by  Alexander Nozdrin implements common code following the guidelines set in ​​WL#6380​​. Different API implementations are based on some core API and have common code. This work specifies that core API.

Define and Implement API for data dictionary objects (multiple worklogs) —  This work by Gopal Shankar work specifies interfaces of the following new data dictionary object types: Function, Procedure, Parameter (​​WL#6384​​​), Event (​​WL#6388​​​), Trigger (​​WL#6383​​​),   Partition, Partition_value, Partition_index (​​WL#7630​​​), Abstract_table, Table, Column, Column_type_element, Index, Index_column, Foreign_key, Foreign_key_column (​​WL#6382​​​), View (​​WL#6389​​​), Tablespace, Tablespace_file (​​WL#6387​​​), and Charset, Collation, Schema (​​WL#6385​​) . The work follows guidelines from API framework class definitions described in WL#7284.

Use new data dictionary API for data dictionary objects (multiple worklogs) —  This work by  Gopal Shankar, Mattias Jonsson, Dmitry Shulga, Praveenkumar Hulakund, and Thayumanavar Sachithanantha changes the server to use the new data dictionary. Tables:​​WL#6390​​​ makes the server use the new API defined by​​WL#6382​​​ / ​​WL#6389​​​ / ​​WL#6387​​​ / ​​WL#6385​​​ to store metadata instead of .FRM files.  Partitions: ​​WL#7836​​​ makes the server use the new API defined by ​​WL#7630​​​ instead of .PAR files. Triggers: ​​WL#7896​​​ makes the server use the new API defined by  ​​WL#6383​​​ instead of .TRG/.TRN files. Stored Routines: ​​WL#7897​​​ makes the server use the new API defined by ​​WL#6384​​​ to create/drop/load triggers. Events: ​​WL#7898​​​ makes the server use the new API defined by ​​WL#6388​​ to create,  alter, drop and load events.

Move FOREIGN KEY constraints to the global data dictionary  (​​WL#6929​​​)  —  This work by Jon Olav Hauglid  populates the new data dictionary tables with foreign key information. See​​WL#6379​​ for the foreign key table definitions. Meta-data for Foreign Keys are moved up from the SE layer to the common data dictionary paving the way for proper meta-data locking for Foreign Keys. Note that this is preparatory work, right now the new dictionary tables are populated but not used. Now, in 8.0.0 the Foreign Keys work as they do in 5.7.

Deprecation and Removal

Deprecate and remove partitioning storage engine  (​​WL#8971​​​) —  This work by  Sivert Sørumgård deprecates (5.7) and removes (8.0) the partitioning storage engine (​​ha_partition​​).  The responsibility for partitioning has been moved down to the storage engine layer. InnoDB supports partitioning from 5.7 and onwards.Remove –ssl, –ssl-verify-server-cert client-side options  (​​WL#9091​​​) — This work by Ramil Kalimullin removes the ​​--ssl​​ and ​​--ssl-verify-server-cert​​ client options in 8.0 (deprecated in 5.7). This includes removal of ​​--ssl​​ synonyms such as ​​--skip-ssl​​. (The server side ​​--ssl​​ option remains unchanged). These client side options were replaced by the ​​--ssl-mode​​ client option in 5.7 (​​​WL#8785​​​).Deprecation warnings for bit-wise operation with [VAR]BINARY args  (​​WL#9015​​​)  —  This work by Catalin Besleaga adds a warning in 5.7 in the cases where bit-wise operations changes from returning ​​BIGINT​​ in 5.7 to returning ​​BINARY​​ in 8.0. See “Backward-compatibility and incompatibilities” section in ​​​WL#8699​​​.Remove deprecated GIS functions  (​​WL#8157​​​)  —  This work by Erik Froseth removes function names that were deprecated (​​WL#8055​​​, ​​WL#7541​​​) in 5.7. This is a standardization effort, the actual functionality is still present but another function name with the ​​ST_​​ or ​​MBR​​prefix must be used.Deprecate and remove mysql_shutdown()  (​​WL#9014​​​)  —  This work by Georgi Kodinov deprecates the ​​COM_SHUTDOWN RPC​​ and the C API function ​​mysql_shutdown()​​ in 5.7 and removes them in 8.0. This shutdown functionality has been replaced by the ​​SHUTDOWN SQL​​statement (​​COM_QUERY RPC​​), see ​​​WL#6784​​​.Deprecate and remove the parameter innodb_support_xa  (​​WL#8843​​​)    —  This work by Marko Mäkelä deprecates the obsolete ​​innodb_support_xa​​ start-up option / session variable in 5.7 and removes it in 8.0. The default value of ​​innodb_support_xa​​InnoDB: Remove deprecated parameter innodb_stats_sample_pages  (​​​WL#8903​​​) —  This work by Annamalai Gurusami removes the parameter ​​innodb_stats_sample_pages​​ in 8.0, the parameter was deprecated in 5.6. You should use ​​​>innodb_stats_transient_sample_pages​​instead.

InnoDB: Remove deprecated parameter innodb_locks_unsafe_for_binlog (​​WL#8894​​)  —  This work by Annamalai Gurusami removes the parameter  innodb_locks_unsafe_for_binlog in 8.0, the parameter was deprecated in 5.6. The option is no longer needed as the READ COMMITTED transaction isolation level does the same as the parameter innodb_locks_unsafe_for_binlog.

InnoDB: Remove the deprecated parameter innodb_checksums  (​​WL#8893​​​) —  This work by Annamalai GurusamiIn removes the parameter ​​innodb_checksums​​ that was deprecated in 5.6. The functionality is covered by ​​innodb_checksum_algorithm​​InnoDB: Remove the use of *.isl files (​​​WL#6416​​​) —  This work by Marko Mäkelä removes​​*.isl​​InnoDB startup refactoring (​​​WL#7488 ​​​) —  This work by Marko Mäkelä refactors the InnoDB startup code.  On Windows, setting ​​innodb_flush_method​​ has been decoupled from​​innodb_use_native_aio​​. The value ​​innodb_flush_method=unbuffered​​InnoDB: Remove deprecated file format parameters in 8.0 ( ​​​WL#7704​​​)  —  This work by Marko Mäkelä removes the configuration variables (options) ​​innodb_file_format​​,​​innodb_file_format_check​​, ​​innodb_file_format_max​​, and ​​innodb_large_prefix​​ in 8.0. These options were deprecated in 5.7, see ​​​WL#7703​​​. The file_format column in information schema ​​tables innodb_sys_tablespaces​​ and ​​innodb_sys_tables​​ has also been removed. See also ​​​WL#8307​​​ “Make ROW_FORMAT=DYNAMIC the default”.Deprecate mysql_plugin in 5.7 and remove in 8.0  (​​WL#8927​​​) —  This work by Dyre Tjeldvoll deprecates the ​​mysql_plugin​​ utility in 5.7, and removes it in 8.0. This utility is no longer in use, alternatives include loading plugins at server startup using the ​​--plugin-load​​ or ​​--plugin-load-add​​ option, or at runtime using the ​​INSTALL PLUGIN​​Remove deprecated LOAD DATA events from replication code  (​​​WL#9218​​​) —  This work by Libing Song removes binlogging of the following events for ​​LOAD DATA​​ statements:​​LOAD_EVENT​​, ​​CREATE_FILE_EVENT​​, ​​EXEC_LOAD_EVENT​​ and ​​NEW_LOAD_EVENT​​. They were deprecated in 5.0. These events have been replaced by ​​BEGIN_LOAD_QUERY_EVENT​​ and​​EXECUTE_LOAD_QUERY_EVENT​​.

Remove the code in log_event_old.cc (​​WL#9217​​​) —  This work by  Libing Song removes some code that has not been in use since 5.1. See ​​WL#9218​​ above.

Remove mysql_install_db and server –bootstrap option  (​​WL#9071​​​) —  This work by Sivert Sørumgård removes the ​​mysql_install_db​​ and the server ​​--bootstrap​​ option in MySQL 8.0, both deprecated in 5.7 by ​​​WL#7307​​​. This has been replaced by new bootstrap code, see​​WL#6394​​ above.