Blogs > Pangu-Weather from Huawei Cloud Outperforms NWP Methods in Terms of Accuracy for Medium-Range Forecast

Pangu-Weather from Huawei Cloud Outperforms NWP Methods in Terms of Accuracy for Medium-Range Forecast

Huawei Cloud Jul 07, 2023

Copy the link to clipboard

On July 6, our paper titled "Accurate medium-range global weather forecasting with 3D neural networks" was published in Nature. In this document, we mainly introduce the technical points and representative forecast results of Pangu-Weather. For technical details, see the paper published on Nature: Accurate medium-range global weather forecasting with 3D neural networks

Pangu-Weather from Huawei Cloud Outperforms NWP Methods in Terms of Accuracy for Medium-Range Forecast

On July 6, our paper titled "Accurate medium-range global weather forecasting with 3D neural networks" was published in Nature. In this document, we mainly introduce the technical points and representative forecast results of Pangu-Weather. For technical details, please see the paper published on Nature: Accurate medium-range global weather forecasting with 3D neural networks.

Publication Page in Natue：https://www.nature.com/articles/s41586-023-06185-3

Since the 1920s, especially in the past 30 years, numerical weather prediction (NWP) has made great progress in daily weather forecast, extreme weather warning, and climate change prediction thanks to rapidly growing computing power. However, computing power does not always grow exponentially and physical models are becoming more complex. Traditional NWP methods are failing. Researchers try to explore new, better methods for weather forecasting, such as deep learning-based methods. In fields where NWP methods are most widely used, for example, medium-range forecast, existing AI-based methods deliver accuracy significantly lower than that of NWP. These AI-based methods are also restricted by problems such as lack of explainability and poor extreme weather forecasts.

To address these issues, Huawei Cloud proposes a new AI-based, high-resolution, global weather forecast system: Pangu-Weather (also named Pangu Meteorology Model). For the first time in history, an AI-based method outperforms NWP methods in terms of accuracy. It offers higher forecast accuracy than NWP methods (operational IFS[1] of ECMWF) in all time ranges (from one hour to one week), at a speed of 10,000 times faster. It also provides global weather forecast in seconds, covering factors such as geopotential, humidity, wind speed, temperature, and mean sea level pressure (MSLP). This model uses the spatial resolution 0.25° x 0.25° across 13 pressure levels, with a time granularity of 1 hour. As a pre-trained model, Pangu-Weather can be directly applied to multiple downstream scenarios. For example, in the tropical cyclone tracking task, Pangu-Weather provides accuracy significantly higher than ECMWF-HRES.

1 Introduction: Can AI-based weather forecast methods surpass NWP methods?

Weather forecast is one of the most important scenarios of scientific computing. It offers the ability to predict future weather changes, especially the occurrence of extreme weather events such as rainstorms, typhoons, droughts, and extreme cold. Conventional NWP methods mostly follow a simulation-based paradigm that formulates the physical rules of atmospheric states into partial differentiable equations (PDEs) and solves them using numerical simulations. They have achieved remarkable success in the past three decades. As computing power does not always grow exponentially and physical models are becoming more complex, these NWP methods are failing. NWP consumes too much computing power. For example, with a spatial resolution of 0.25° x 0.25°, a single simulation procedure for 10-day forecast can take hours of computation using more than 3000 nodes in a supercomputer. In addition, the parametric numerical models that the NWP methods largely rely on are often considered inadequate, and errors will be introduced by parameterization of unresolved processes.

AI-based weather forecast methods have achieved great success in short-range forecast due to its fast prediction speed. NWP methods cannot provide minute-level weather forecast, but AI-based methods can fit the radar reflectivity and surpass extrapolation methods such as optical flow. However, in terms of medium-range weather forecast (one of the most successful fields of NWP), the AI-based methods, through faster, provide poor resolution and accuracy than NWP methods. In March 2022, NVIDIA launched the FourCastNet model[2], which increased the spatial resolution to 0.25° x 0.25°, comparable to the ECMWF IFS. However, the forecast accuracy of FourCastNet is still below satisfaction. For example, the Root Mean Square Error (RMSE) of 5-day Z500 forecast using a single model and a 100-member ensemble are 484.5 and 462.5, respectively, which are much worse than 333.7 reported by operational IFS of ECMWF. Prior to the launch of Pangu-Weather, AI-based methods are mainly used as a fast alternative model for NWP, rather than a substitute. Some meteorologists even believe that there is still a long way to go before AI-based methods surpass conventional NWP methods[3].

For the first time in the history, Pangu-Weather surpasses existing NWP methods in medium-range weather forecasting.

The training and testing of Pangu-Weather are performed on the 5th generation of ECMWF reanalysis (ERA5) data. The researchers download 43 years (1979–2021) of global weather data, among which they use the 1979–2017 data for training, the 2019 data for validation, and the 2018, 2020, and 2021 data for testing. They choose 13 pressure levels, each with 5 important variables (temperature, humidity, geopotential, u-component of wind speed, and v-component of wind speed), and the surface level with 4 variables (2-meter temperature, u-component of 10-meter wind speed, v-component of 10-meter wind speed, and MSLP). Figure 1 shows some key results of Pangu-Weather. Pangu-Weather outperforms all existing weather forecast systems (including operational IFS of ECMWF) in all aspects. For example, Pangu-Weather reports an RMSE of 5-day Z500 forecast of 296.7, significantly better than operational IFS and FourCastNet, which reported 333.7 and 462.5, respectively. In addition, the inference cost of Pangu-Weather is merely 1.4 seconds on an NVIDIA Tesla-V100 GPU for 24-hour global weather forecast, more than 10,000 times faster than operational IFS.

Figure 1a Pangu-Weather offers significantly higher accuracy than operational IFS of ECMWF and FourCastNet of Nvidia, with different weather elements and in different months

Figure 1b Visualization of 3-day weather forecast produced by Pangu-Weather on two variables and comparison with ground-truth values

Figure 1c Pangu-Weather provides higher accuracy than conventional NWP methods about the paths of two strong tropical cyclones, Kong-rey and Yutu in 2018. Pangu-Weather makes the correct forecast (Yutu goes to Mariana Islands of the Philippines) as early as 6 days before landing, 48 hours earlier than ECMWF-HRES.

2 Method: 3D neural network and hierarchical temporal aggregation

Since meteorological data and image data share many similarities, researches wonder whether they can use the existing computer vision (CV) model to analyze meteorological data. That is the origin of the idea of Pangu-Weather. Based on existing works (including FourCastNet of NVIDIA), Pangu-Weather researchers found out mainly two aspect on why AI-based methods fell behind NWP methods in terms of forecast accuracy. Firstly, existing AI-based methods often worked on 2D neural networks and cannot well process uneven 3D weather data. Secondly, these AI-based methods lack real-world constraints and suffer from cumulative forecast errors. As described in the paper, the researchers used 3D Earth-Specific Transformer (3DEST) to process complex uneven 3D meteorological data, as well as a hierarchical temporal aggregation strategy to reduce the number of prediction iterations, thereby reducing iteration errors.

3 Neural network architecture: 3D Earth-Specific Transformer (3DEST)

Figure 2 shows the overall architecture of 3DEST. The main idea is to use a 3D variant of vision transformer[4] to deal with complex, uneven weather elements. Considering the high resolution of meteorological data, researchers reduce the network encoder and decoder to two levels (eight blocks) and use the window-attention mechanism of Swin transformers[5] to further reduce computational costs. Note that even if these methods are used, the overall floating point operations per second (FLOPS) of the network still exceeds 3000 Gbit/s. In the future, if there is sufficient computing power, a larger network can be used to further improve the forecast accuracy.

Figure 2 Overall architecture of 3DEST

After careful analysis on the nature of meteorological data, researchers apply an earth-specific positional bias in each network block. This is the most important improvement. The biggest difference between weather forecast data and common image data is that each pixel on the feature map corresponds to an absolute position on the Earth, whereas pixels on the image usually contain only relative position information. In addition, as shown in figure 3, longitude and latitude grids corresponding to the weather elements are uneven, and different elements are unevenly distributed at different latitudes and heights. The modeling of these uneven elements is conducive to learning the complex physical laws behind meteorological data, such as Coriolis force. In the paper, a relative positional bias related to the latitude and height is introduced in each transformer module to learn the irregular components of each spatial operation. The modified transformer module is called 3D Earth-Specific Transformer. Please read the paper for more technical details.

Figure 3 Reasons for using an earth-specific positional bias.

Left: The longitude and latitude grids correspond to an uneven spatial distribution on the Earth's sphere.

Middle: The geopotential height is closely related to the latitude and height.

Right: The mean wind speed and temperature are closely related to the vertical height.

4 Hierarchical temporal aggregation strategy

A model must be called multiple times for medium-range weather forecast. For example, FourCastNet trained a base model for 6-hour forecast, so that performing a 7-day forecast required executing the model 28 times iteratively. Forecast errors can grow rapidly because AI-based methods often do not consider real-world constraints. As shown in Figure 4, when the researchers mimic FourCastNet to execute the 6-hour model 28 times to achieve up to 7-day forecast, the forecast accuracy is obviously lower than that of the 24-hour model executed in 7 times. When the 1-hour model is executed 168 times, a superlinear growth in forecast errors can be identified.

Figure 4 Comparison of 7-day forecast accuracy using different models

To alleviate iteration errors, the paper proposes a straightforward yet effective strategy named hierarchical temporal aggregation. The researchers train four individual models for 1-hour, 3-hour, 6-hour, and 24-hour prediction, respectively. They use the greedy algorithm to guarantee the minimal number of iterations. For example, for 24-day forecast, they execute 24-hour forecast only one time, whereas for a 23-hour forecast, they execute 6-hour forecast 3 times, followed by 3-hour forecast 1 time and 1-hour forecast 2 times. By using multiple models with different lead times, Pangu-Weather reduces iteration errors by recursive training as well as computational resource consumed. In a training process, Pangu-Weather is trained by using a weather condition supervision model at a single time. However, existing AI-based methods (such as FourCastNet) usually monitor weather conditions at multiple time points to reduce iteration errors, thereby multiplying GPU memory consumption and training time while reducing the stability of the training process.

[Computing power consumption] Using the weather data from 1979-2021 sampled in hours, the researchers train each model for 100 epochs for 16 days, with 192 NVIDIA Tesla-V100 GPUs. The 100 epochs are actually insufficient for the training procedure to arrive at full convergence. In other words, if a more powerful computational device is available, the accuracy of AI-based weather forecast can be further improved. To make a 24-hour global weather forecast, Pangu-Weather needs 1.4 seconds on a single NVIDIA Tesla-V100 GPU, 10,000 times faster than conventional NWP methods.

5 Experiment result display

The paper demonstrates Pangu-Weather on two datasets. The first one is the data of 2018, 2020, and 2021 of ERA5[6], which is mainly used to test the overall forecast accuracy of Pangu-Weather. The second one is International Best Track Archive for Climate Stewardship (IBTrACS) dataset[7] for evaluating the ability at tracking tropical cyclones, a special case of extreme weather forecast. The objects used for comparison include the most advanced NWP methods (operational IFS offered by ECMWF, downloaded from the TIGGE archive) and AI-based methods (forecast accuracy used in NVIDIA's FourCastNet paper).

5.1 Deterministic forecast results

[Upper-air atmospheric variables] As shown in Figure 1 and Figure 5, the accuracy of Pangu-Weather on upper-air atmospheric variables Z500, T850, T500, Q500, U500, and V500 is higher than that of operational IFS in all forecast times. For example, for Z500, the 3-day and 5-day RMSEs (in m2/s2) of operational IFS are 152.8 and 333.7, respectively, and Pangu-Weather reduces them to 134.5 and 296.7. For T850, the 3-day and 5-day RMSEs (in K) of operational IFS are 1.37 and 2.06, respectively, and Pangu-Weather reduces them to 1.14 and 1.79. The relative drop of RMSE is more than 10% in all scenarios. Considering the difference between forecast times at the same forecast accuracy, Pangu-Weather, when compared with operational IFS, delivers a forecast time gain of 10–15 hours (Pangu-Weather delivers the forecast result 10-15 hours earlier, with the same accuracy). When compared with FourCastNet, Pangu-Weather delivers even more significant accuracy gains – the relative reduction of RMSE is more than 30% in the above scenarios, and the forecast time gain is also enlarged to more than 36 hours.

Figure 5

Left: Quantitative comparison results of four upper-air atmospheric variables. Right: Quantitative comparison results of three surface weather variables.

[Surface weather variables] As shown in Figure 5, Pangu-Weather uses three surface weather variables: 2-meter temperature (T2M), u-component of 10-meter wind speed (U10), and v-component of 10-meter wind speed (V10). It delivers accuracy higher than operational IFS of ECMWF and FourCastNet of NVIDIA for all these variables. In terms of forecast time gain, Pangu-Weather delivers an 18-hour gain when compared with operational IFS. For example, for T2M, the 3-day and 5-day RMSEs (in K) are 1.34 and 1.75 for operational IFS, 1.39 and 2.00 for FourCastNet, and Pangu-Weather reduces them to 1.05 and 1.53, respectively. For U10, the 3-day and 5-day RMSEs (in m/s) are 1.94 and 2.90 for operational IFS, 2.24 and 3.41 for FourCastNet, and Pangu Weather reduces them to 1.61 and 2.53, respectively.

Figure 6 Visualization of 3-day forecast results of Pangu-Weather on the other two variables and comparison with traditional NWP forecast results and real values.

[Visualization] As shown in Figure 1 and Figure 6, Pangu-Weather can well predict fine-grained meteorological features. AI forecast results are usually smoother as well, whereas NWP shows non-existent features more frequently. This reflects the differences and complementarities between AI-based methods and traditional NWP methods.

[Diagnosis studies] The paper provides two diagnosis studies.

As shown in Figure 1, in terms of Anomaly Correlation Coefficient (ACC), Pangu-Weather outperforms operational IFS in every month. More importantly, the advantage of Pangu-Weather becomes more significant in the worst performed months, implying that AI-based methods have learned useful and complementary knowledge from large data.
As shown in Figure 7, a clear advantage of Pangu-Weather lies in its ability of performing hourly weather forecast. In contrast, previous high-resolution AI-based methods (such as FourCastNet) can perform only six-hour weather forecast.

Figure 7 Hourly weather forecast results of Pangu-Weather

5.2 Results on extreme weather events

[Overall tendency in predictions of extremes] Like FourCastNet, Pangu-Weather calculates the relative quantile error (RQE) to measure the extreme weather forecast trend of different weather forecast methods. (See the paper for details about the mathematical definition.) If RQE is less than 0, the model tends to underestimate the intensity of extremes. If RQE is greater than 0, the model tends to overestimate the intensity of extremes. If RQE is close to 0, the model provides more accurate forecast. As shown in Figure 8, both AI-based methods and NWP methods tend to underestimate the intensity of extremes. Compared with operational IFS, Pangu-Weather shows lower absolute RQE values (lighter underestimation) for Q500 and higher absolute RQE values (heavier underestimation) for U500. Regarding U10, Pangu-Weather is much better than operational IFS. Thanks to the efficient hierarchical temporal aggregation strategy, Pangu-Weather delivers a significantly higher RQE than FourCastNet (less underestimated) for U10. This also shows that Pangu-Weather delivers higher accuracy in deterministic forecast.

Figure 8 RQE change trends of three weather variables with time

[Tropical cyclone tracking] As shown in Figure 9, Pangu-Weather can accurately predict the tropical cyclone path by checking four variables: MSLP, 850 hPa vorticity, 10-meter wind speed, and thickness between 850 hPa and 200 hPa. (Please see the paper for more details) Researchers apply the prediction method to 88 named tropical cyclones in 2018 (intersection of IBTrACS data and ECMWF-HRES tropical cyclone prediction in TIGGE) and find that Pangu-Weather delivers accuracy significantly higher than conventional NWP methods. Compared with ECMWF-HRES, Pangu-Weather shows obvious advantages in the forecast accuracy for tropical cyclone tracking in different basins, intensities, and forecast time. For example, for the forecast results of the 88 named tropical cyclone, 3-day and 5-day mean direct position errors (for cyclone eyes) of Pangu-Weather are 120.29 km and 195.65 km, respectively, which are significantly smaller than 162.28 km and 272.10 km reported by ECMWF-HRES.

As shown in Figure 1 and Figure 10, Pangu-Weather delivers higher accuracy on two super typhoons Kong-rey and Yutu in 2018 (multiple incorrect path forecasts for a long time). Especially for Yutu, Pangu-Weather makes the correct forecast for its landing (Mariana Islands of Philippines), 48 hours earlier than ECMWF-HRES.

Figure 9 Pangu-Weather accurately predicts the paths of tropical cyclones by tracking several key variables

Figure 10 Dynamic prediction results on Kong-rey show the leading position of Pangu-Weather over traditional NWP methods

5.3 Ensemble forecast

The paper also explores a simple method of ensemble forecast using Pangu-Weather. The researchers add random Perlin noise to the model and obtain 99 groups of forecast with perturbations and 1 group of forecast without perturbations. As shown in Figure 11, the accuracy of ensemble forecast with perturbations is slightly worse than deterministic forecast without perturbations in short-range (less than 2 days) weather forecast, but is significantly higher than deterministic forecast without perturbations when forecast time is longer than 5 days. For example, with ensemble forecast, the RMSEs of 7-day forecast for Z500 and U10 are reduced from 500.3 and 3.48 to 450.6 and 2.96, with relative drops of 10% and 15%, respectively. With richer meteorological knowledge, more advanced ensemble forecast methods can be developed, such as using singular vectors. The researchers hope that ensemble forecast of Pangu-Weather can be further improved with more experienced meteorologists involved.

Figure 11 Ensemble forecast results of Pangu-Weather and their comparison with deterministic forecast results

6 Summary and prospects

The paper introduces Pangu-Weather, an AI-based system for numerical weather forecast. The technical contribution of the paper includes (i) designing the 3D Earth-Specific transformer (3DEST) architecture and (ii) applying the hierarchical temporal aggregation strategy. By training deep neural networks on 39 years of global weather data, Pangu-Weather, for the first time, surpasses the conventional NWP methods in terms of both accuracy and speed. Being efficient in inference, Pangu-Weather opens a window for meteorologists to integrate their knowledge to AI-based methods for more exciting applications.

Looking into the future, computing power is the key to further improving the accuracy of AI-based weather forecast. According to the experiments provided in the paper, there is much room left in terms of (i) incorporating more observation factors, (ii) integrating the time dimension and training 4D deep networks, and (iii) simply using deeper and/or wider networks. All of these require more powerful GPUs with larger memory and higher FLOPS.

7 References

P. Bougeault, Z. Toth, C. Bishop, B. Brown, D. Burridge, D. H. Chen, B. Ebert, M. Fuentes, T. M. Hamill, K. Mylne et al., "The thorpex interactive grand global ensemble, " Bulletin of the American Meteorological Society, vol. 91, no. 8, pp. 1059–1072, 2010.
J. Pathak, S. Subramanian, P. Harrington, S. Raja, A. Chattopadhyay, M. Mardani, T. Kurth, D. Hall, Z. Li, K. Azizzadenesheli et al., "Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators, " arXiv preprint arXiv:2202.11214, 2022.
M. G. Schultz, C. Betancourt, B. Gong, F. Kleinert, M. Langguth, L. H. Leufen, A. Mozaffari, and S. Stadtler, "Can deep learning beat numerical weather prediction?" Philosophical Transactions of the Royal Society A, vol. 379, no. 2194, p. 20200097, 2021.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., "An image is worth 16x16 words: Transformers for image recognition at scale, " arXiv preprint arXiv:2010.11929, 2020.
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, "Swin transformer: Hierarchical vision transformer using shifted windows, " in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 012–10 022.
H. Hersbach, B. Bell, P. Berrisford, S. Hirahara, A. Horanyi, ´J. Munoz-Sabater, J. Nicolas, C. Peubey, R. Radu, D. Schepers ˜ et al., "The era5 global reanalysis, " Quarterly Journal of the Royal Meteorological Society, vol. 146, no. 730, pp. 1999–2049, 2020.
K. R. Knapp, M. C. Kruk, D. H. Levinson, H. J. Diamond, and C. J. Neumann, "The international best track archive for climate stewardship (ibtracs) unifying tropical cyclone data, " Bulletin of the American Meteorological Society, vol. 91, no. 3, pp. 363–376, 2010.
https://www.nvidia.com/en-us/on-demand/session/gtcfall22-a41326/

Everything as a Service

Building the Cloud Foundation for an Intelligent World

About Us