Data Lake Insight

Data Lake Insight (DLI) is a fully-managed big data processing and analysis service based on Apache Spark. Without data migration, DLI provides you with insights from heterogeneous data of various cloud services by using SQL and Spark programs.

Pay per use, ¥1.4/hour per compute unit

Learn more
  • You are advised to deploy DLI and  together to provide the one-stop IDE platform for data development.

Product Advantages
  • Compatible & Open

    Seamlessly migrates offline Spark applications to the cloud based on the open-source Apache Spark ecosystem and APIs, reducing your migration workload.

  • Powerful Computing Power

    Adopts the high-scalability big data architecture to process data at the TB-EB scale, allowing you to handle data analysis requests in various scenarios at ease.

  • Excellent Performance

    Uses the in-memory computing model, DAG scheduling framework, and efficient optimizer to deliver the comprehensive performance 100 times over that of the traditional MR model.

  • Low Costs

    Bills you based on the usage time. The pricing unit of DLI is compute unit (CU). A CU contains four cores and 16 GB memory. DLI bills you ¥1.4 per CU per hour.


  • Standard SQL

    You can connect to DLI using JDBC or SDK, and DLI complies with ANSI SQL 2003. With DLI, you can perform analysis based on massive volume of data, instead of taking care of the deployment and O&M of SQL engines.

  • Serverless Spark

    DLI offers full-stack Spark capabilities, such as Spark SQL, Spark Streaming, and Spark Batch based on the Apache Spark ecosystem, and helps you analyze data at the TB-EB scale with standard SQL or Spark APIs.

Standard SQL

Serverless Spark

  • Enterprise-class Multi-Tenancy

    Computing resources are isolated between tenants to meet job SLAs. Your data rights can be restricted to a specific table or column for data sharing between departments and rights management.

  • SQL on AI

    DLI integrates the capabilities of processing and analyzing images, videos, and languages in SQL to offer convergent analysis for structured and unstructured data.

Enterprise-class Multi-Tenancy


  • Federated Analysis of Heterogeneous Data Sources

    DLI can work with multiple data formats, such as CSV, JSON, Parquet, ORC, and CarbonData, and supports federated analysis on data from multiple various cloud services (for example, OBS, DWS, CloudTable, and RDS) with data migration, helping you quickly fulfill business innovations and get valuable insights from data.

  • Auto Scaling

    Auto scaling of storage and computing resources allows you to query data without worrying about whether you have sufficient resources.

Federated Analysis of Heterogeneous Data Sources

Auto Scaling

Application Scenarios
  • Large-scale Log Analysis

  • Federated Analysis of Heterogeneous Data Sources

  • Big Data ETL

Large-scale Log Analysis

Game Operation Data Analysis

Different departments of a game company analyze daily new logs via the game data analysis platform to obtain required metrics and make decisions according to the obtained metric data. For example, the operation department obtains required metric data, such as new players, active players, retention rate, churn rate, and payment rate, through the platform to learn the current game status and determine follow-up actions. The placement department obtains the channel sources of new players and active players through the platform to determine the platforms for placement in the next cycle.


Efficient Spark Programming Model

DLI uses Spark Streaming to directly ingest data from DIS and perform preprocessing such as data cleaning. You only need to edit the processing logic, without the need to pay attention to the multi-thread model.

Easy to Use

You can use standard SQL statements to compile metric analysis logic without paying attention to the complex distributed computing platform.

Pay per Use

Log analysis is scheduled periodically based on the time requirements. There is a long idle period between each two scheduling operations. DLI adopts the pay-per-use billing mode, which saves the cost by more than 50% compared with the exclusive cluster mode. DLI only bills you for the resources used for scheduling.

Related Services




Federated Analysis of Heterogeneous Data Sources

Digital Service Transformation for Car Company

In the face of new competition pressures and changes in travel services, car companies build the IoV cloud platform and IVI OS to streamline Internet applications and vehicle use scenarios, completing digital service transformation for car companies. This delivers better travel experience for vehicle owners, increases the competitiveness of car companies, and promotes sales growth. For example, collect and analyze daily vehicle metric data (such as batteries, engines, tire pressure, and airbags), and give feedback on maintenance suggestions to vehicle owners in time.


No Need for Migration in Multi-source Data Analysis

RDS stores the basic information about vehicles and vehicle owners, CloudTable stores real-time vehicle location and health status information, and DWS stores periodic metric statistics. DLI allows federated analysis on data from multiple sources without data migration.

Tiered Data Storage

Car companies need to retain all historical data to support auditing and other services that require infrequent data access. Warm and cold data is stored in OBS and frequently accessed data is stored in CloudTable and DWS, reducing the overall storage cost.

Big Data ETL

Geographic Big Data Analysis

Geographic big data has big data characteristics. It features large data volume (for example, PB-scale global satellite remote sensing image data is generated) and numerous data varieties (for example, structured remote sensing image raster data, vector data, unstructured spatial location data, and 3D modeling data). Users focus on how to use efficient mining tools or mining methods to get insights from the large volume of geographic big data.


Flexible to Explore

DLI supports full-stack Spark capabilities and provides rich Spark spatial data analysis algorithm operators. It delivers full support of offline batch processing on massive volumes of data, such as structured remote sensing image data, unstructured 3D modeling, laser cloud data, and real-time computing on dynamic streaming data with location attributes.

Big Data ETL

DLI allows you to quickly migrate remote sensing image data at the TB or EB scale and perform image data slicing to offer resilient distributed datasets (RDDs) for distributed batch computing.

Usage Guides

Create an Account and Experience HUAWEI CLOUD for Free

Register Now