
  • 一、元数据到底是个啥?
  • 二、元数据是从哪里来的
  • 三、有了元数据,我们能做些什么
  • 四、Data Catalog for Digital Transformation
  • 1. Introduction
  • 2. Data Catalog Objectives and Benefits
  • 3. Data Catalog Features
  • 五、元数据管理工具


1. 元数据(Meta Data)是描述数据的数据


2. 元数据管理,是数据治理的核心和基础


  • 我们有哪些数据?
  • 数据分布在哪里?
  • 这些数据分别是什么类型?
  • 数据之间有什么关系?
  • 哪些数据经常被引用?哪些数据无人光顾?


3. 元数据是描述数据的数据,那么有没有描述元数据的数据

有。描述元数据的数据叫元模型(Meta Model)。元模型、元数据、数据之间的关系,可以用下面这张图来描述

元数据管理 订阅 kafka 数据元数据管理_元数据


元数据本身的数据结构也是需要被定义和规范的,定义和规范元数据的就是元模型,国际上元模型的标准是CWM(Common Warehouse Metamodel,公共仓库元模型),一个成熟的元数据管理工具,需要支持CWM标准。



元数据管理 订阅 kafka 数据元数据管理_数据_02


  • 技术元数据库表结构字段约束数据模型数据库细节等。
  • 操作元数据ETL程序(数据处理、调度、异常处理)
  • 业务元数据业务指标、业务代码、业务术语等。
  • 管理元数据:数据所有者、数据质量定责、数据安全等级等。




元数据管理 订阅 kafka 数据元数据管理_数据_03

  • 元数据查看


  • 数据血缘和影响性分析




影响性分析的典型应用场景:某机构因业务系统升级,在“FINAL_ZENT ”表中修改了字段:TRADE_ACCORD长度由8修改为64,需要分析本次升级对后续相关系统的影响。对元数据“FINAL_ZENT”进行影响性分析,发现对下游DW层相关的表和ETL程序都有影响,IT部门定位到影响之后,及时修改下游的相应程序和表结构,避免了问题的发生。由此可见,数据的影响性分析有利于快速锁定元数据变更带来的影响,将可能发生的问题提前消灭在萌芽之中。

  • 数据冷热度分析



四、Data Catalog for Digital Transformation


1. Introduction

Companies are starting their digital transformation to add value to their data and to build a data-driven strategy. Unfortunately, most organizations govern their data in an ad hoc or firefighting manner across different parts of the business, and most of the time only within IT. Mapping data by building a data catalog is one of the first steps toward more governance and sustainability.

Gartner gives the following definition: “A data catalog maintains an inventory of data assets through the discovery, description, and organization of datasets. The catalog provides context to enable data analysts, data scientists, data stewards, and other data consumers to find and understand a relevant dataset for the purpose of extracting business value.
But Gartner’s definition does not really defer from historical metadata management as it does not focus on what makes data catalogs today so trendy: automation and collaboration. Excel-based or IT-driven data dictionaries are over, and the amount of data is too important and does require automation for scaling. Data consumers want to access data and to enrich, comment, and challenge the use and the quality of data.

Let’s dare to give a definition: “A data catalog is an automated collection of metadata, combined with data management and search tools, that helps analysts and other data users to find the data that they need. It also serves as an inventory of available data and provides information to evaluate the fitness of data for intended uses.” In few words, a data catalog is your organization metadata social network!

2. Data Catalog Objectives and Benefits


  • allow data citizens to find the data they need in an efficient way
  • empower organizations to quickly invent, discover, manage, and understand all their data
  • move from tribal to centralized and crowdsource knowledge
  • ingest new data sets and the use of new of data faster
  • become the foundational layer for driving data governance, quality, and information security policies
  • foster collaboration between business and IT to contribute to the shared understanding of the information


  • data catalogs contribute to increasing efficiency, as they allow analysts to short cut the time, they need to qualify the correct data.
  • They also support data governance and risk mitigation by identifying personal and sensitive data, and by allowing you to establish and spread best practices in terms of data management and data quality.
  • Finally, data management is simplified as new data sources can be onboarded more quickly and key assets can be easily identified and monitored, as redundant and untapped data can be detected and remediated. In the end, the data ecosystem gets rationalized and more agile.

3. Data Catalog Features

Most of the existing solutions rely on the following four main components:

  • A flexible data model for storing the metadata objects and their relationships
  • A set of data discovery services that allow you to extract metadata from structured and unstructured data sources as well as enriching (discovering, scoring) metadata with additional information/insight
  • Search and indexing services that allow you to make the information available as quick as possible and to formulate complex search queries
  • An intuitive, easy to use, and collaborative user interface so that any kind of user can search and find what he or she needs


https://engineering.linkedin.com/blog/2019/data-hub https://github.com/linkedin/datahub