Challenges

Data Governance Innovation Lab is committed to simplifying data management and driving data monetization in the era of big data.


Business Objectives

Data Governance Innovation Lab aims to create technical competitiveness for HUAWEI CLOUD intelligent data services, achieve collaborative analysis of data in different forms, share and open ecosystems, and release the power of data.

Open Ecosystems and Accelerated Generation of Data Assets, Releasing the Power of Data

1.  Virtual asset federation and data management in every domain

2.  Open architectures and models; joint development of industry assets with partners

3.  Full-link tools for helping partners accelerate the generation of data assets

Open Source for Building a Differentiated Data Computing Ecosystem

1. Data acceleration layer: a subsecond-level interactive engine for the logical lake to implement cross-source and cross-domain collaboration; optimized data organizations for building a data acceleration layer; combination of data analysis and data organization for a differentiated computing ecosystem

2. +Kunpeng: multi-core acceleration for hotspot computing models, providing the most cost-effective unit data processing

3. +AI: next-generation computing mode explored using vector calculation to enable intelligent engines

Unified Management with One View of All Data in a Data Lake

1. Unified metadata and data cataloging in a domain

2. Unified security management and privacy protection for sensitive data

Research

  • Big Data Technology

  • Intelligent Governance

  • Intelligent Visualization

Big Data Technology
  • Intelligent Data Value Exploration Platform

    Traditional data analysis, integration, governance, and development are driven by individual service requirements.

    The future is an era of data-driven innovation. Mining data value and new service scenarios from massive data through uncertain and random data exploration behavior will become the norm. Therefore, we are exploring the random and informative intelligent data exploration platform to help customers discover value.

  • Next-Generation Intelligent Data Lake Computing Mode Powered by Vector Calculation

    Factors of AI such as feature vectorization, confidence, and probability pose new requirements on data computing and storage.

    The collision of vector calculation and statistical analysis can guide exploration for the next-generation of big data computing.

Intelligent Governance
  • Intelligent Data Detection, Repair, Association, and Sampling

    Intelligent data quality detection and repair, association, entity merging, sampling, and comprehensive profiling

  • Intelligent Data Asset Management Engine

    Federated metadata management of data assets of public cloud, private cloud, and local data sources; tens of millions of metadata and their relationships, and millisecond-level query performance; unstructured metadata governance, and fuzzy retrieval and recommendation of images, video, and text; real-time metadata system of a data lake for unified metadata management of a big data cluster with more than 20,000 nodes

  • Intelligent Data Security Management Engine

    Full-link security governance: algorithms for various GDPR-compliant data classification and masking scenarios, including data labeling and watermarking

  • Intelligent Data Quality Engine

    Intelligent data quality algorithms: abnormal data detection and repair algorithm, entity merging algorithm, and data column association algorithm; higher than 90% accuracy and recall rate for all datasets

    High-performance data quality engine: TB-level data quality detection in seconds, and distributed memory cache and automatic scaling

  • Intelligent Model-driven Engine

    Model-driven data development: intelligent data pipeline construction and data asset generation based on models

  • High-Performance Cross-Source Query Optimizer

    Multiple computing engines, such as Hive,Spark, HBase, and MySQL, implementing cross-region and cross-engine scheduling and optimization, and improving performance by over 10 times compared with open-source Rheem and Calcite

  • Intelligent Hybrid Data Lake Scheduling Engine

    Cross-region data resource scheduling, cross-public cloud and HCS hybrid cloud data resource scheduling, and AI operator scheduling; concurrent scheduling of millions of nodes during peak hours

Intelligent Visualization
  • Visualized Development Recommendation Engine Based on Machine Learning

    Intelligent industry module recommendation on visualized screens: intelligent template recommendation based on users' industry background; smart assistance optimization on visualized screens: intelligent one-click optimization (intelligent color matching and layout) through machine learning; scenario-based visualized modeling and development platforms, such as 3D city and 3D campus, as well as device-edge-cloud big data input and visualized interaction and presentation.