Introduction to Resource Management (2)

"Everything involved with military is vital to countries’ living or death, therefore knowing the means of living is rather enssential"Quoting "The Art of War" by Sun Tzu has been regarded as a classic of military strategists for 2500 years.  It is a textbook-style military book model, and has been studied for generations. Emperor Li Shimin said "All ideas about military come from Sunzi".  And Sunzi's Art of War, which studies and expounds how to "Have the situation well in hand, then you can make a victory even a thousand miles away" is the earliest ancient Chinese book of military strategy. In the Art of War of Sunzi, Sunzi's flexible use of the principle of integrity to study military issues, his use of quantitative analysis to plan wars, and his use of the principle of optimization to make scientific decisions are all reflections of his operational thinking. Sun Tzu, for example, pointed out in the chapter of "Calculation". In the "Operations" section, the financial cost of a war is estimated by taking 100,000 troops for example,. In calculating the cost of transporting food and forage, it was noted that:When you get one ton food in enemy state, it is like you take 10-ton food in your own country; When you take 1-ton supply in enemy state, it’s like you get 10-ton supply from you own country..".As for decision-making, for another example, he states in "Meticulous":  For the tips of the war, keeping the enemy country is the best stragety, breaking it is the second option; Keeping the entire army is better than breaking it; Keeping the battalion is better then breaking it; Keeping the soldiers is better than killing them. As a result, defeating the army of the enemy country is not  the best solution, but defeating them without a war is the most superme solution."  These fully reflect and contain the idea of operational research.

In addition, the Song Dynasty Shen Kuo in the "Meng Xi Bi Tan" in the "Ying FU Gong Shi Hao" record: The palace was on fire during the reign of Song Xiang Fu. At that time, Ding Yu was in charge of the restoration of the palace. He was afraid that the place for collecting earth was too far away, so he ordered that a road be cut to collect earth. Soon the main road became a huge moat, and the Bian River flowed into the canal, bringing in rafts and rafts from all parts of the country and all kinds of building materials carried by boat, all of which were carried directly from the moat to the gate of the palace. When this had been done, the abandoned tiles and gravel were filled with dust and earth, and the street was thus made again. The three things that have been done to implement this initiative have all come together to save hundreds of millions of dollars.

The above examples demonstrate the importance of operational arrangements. Simply put, the best use of operational optimization ideas is to achieve the effect of less money do more, doing great things for small money.

The reason for mentioning so many stories about operations and optimization is closely related to the research of basic resource scheduling algorithms involved in the cloud computing Alkaid project. The algorithm innovation lab team of the Technology Innovation Dept undertakes a large number of algorithm research work in IaaS online scheduling and offline scheduling. In the scheduling algorithm, the mathematical idea of operational optimization is widely used. Through these ideas and mathematical means, scheduling problems are analyzed, modeled and optimized. First of all, let's take a look at the current research focus, the allocation rate scheduling algorithm.

1. Capacity Calculation

Elastic computing services on HUAWEI CLOUD provide multiple types of instances with different resource ratios and enhanced features (flavors). For each flavor, the sales trend needs to be predicted based on the current resource status, yearly/monthly orders, and future orders to determine whether to adjust the on-demand sellout watermark, allocate different resource pools, and expand the corresponding resource pool. This is the problem to be solved by capacity calculation.

In the previous issue, we have introduced capacity prediction, which focuses on estimating the future demand based on historical data. Capacity calculation focuses on estimating the sales of flavors based on the detailed constraints that need to be considered during flavor scheduling.

If only the number of resources is considered, the simplest method is to divide the remaining host resources by the flavor configuration resources to obtain the number of instances of the flavor in the future. Moreover, the vectorized parallel algorithm can be used to accelerate computing, which is suitable for large-scale resource pools. However, the real resource scheduling is far from simple. Factors such as heterogeneous resources, NUMA constraints, and flavor symmetric/asymmetric placement need to be considered. These factors cannot be solved by simple division calculation. Since scheduling constraints are so complex, you can directly schedule flavors one by one until no more flavor can be placed. Then you can obtain the number of flavor requests that can be accommodated. This is also the method currently used, which is called Dry-Run. This method can accurately calculate the flavor capacity. However, the calculation efficiency is low and cannot support large-scale calculation. Therefore, a capacity calculation algorithm that can ensure the calculation accuracy and support large-scale parallel calculation of multiple flavors needs to be developed.

2. Allocation rate scheduling algorithm

Based on the real-time VM requests, the optimal available server is selected to meet the requests. Because the requests are random and different restrictions are considered for different requests, the optimal placement in the entire process cannot be achieved during matching. As a result, certain remaining resources cannot be allocated.

The matching placement algorithm aims to reduce the number of fragments generated during online resource provisioning. Because fragmentation is inevitable, we want to reorganize the fragments through secondary adjustments to optimize resource allocation.

2.1  Matching Placement

VM requests of users arrive at random. Resources are released after on-demand tasks are completed. The specifications and time of subsequent user requests are unknown. The current VM placement affects the placement of future requests. Therefore, in terms of the ultimate goal, each step of decision must consider the satisfaction (real-time revenue) of the current VM request and the impact on the future.

The technical research solution will introduce future benefits and algorithm optimization of scheduling policy parameters, and design different evaluation indicators for different scenarios to measure the algorithm effect. Technical points include the predictability of future user request data, how to predict and quantify future benefits, how to build an evaluation system, how to develop indicators, how to optimize placement policies in multiple dimensions, and how to optimize policies.

2.2  Reinforcement Learning Algorithm

Reinforcement learning is suitable for the scheduling problem where VM requests come from random sequences and available resource pools are stable. In this problem, reinforcement learning treats the amount of resources available for allocating request hosts as a state, mapping the state to the value generated by the allocation behavior. The state value relation is the basis of scheduling decision. Select the allocation mode with the maximum value for each request. The number of requests that can be processed by limited host resources is more likely to be the maximum.

3. Secondary adjustment (offline scheduling)

Online scheduling requires real-time placement decisions when a VM request arrives, which determines that the decision can only be made based on the current resource status. However, the best decision for the moment may not necessarily be the best one for the future as subsequent VM requests continue to arrive. Moreover, each online scheduling is based on the result of the previous placement decision. As a result, the final placement effect is far different from the global optimal effect. This requires a second adjustment of resource allocation through offline scheduling.

The second adjustment involves two key problems. One is to calculate the current global optimal solution based on the current resource and request status. The scheduling algorithm used is similar to the online scheduling algorithm described above. The second is how to adjust from the current actual placement result to the result closest to the calculated global optimal solution. Why is it the closest thing to an optimal solution, rather than directly adjusting to it? First, there are many constraints: symmetric/asymmetric multi-NUMA, no hotspot, and affinity/anti-affinity. Second, there are many optimization goals: the fragmentation rate of each resource dimension cannot be increased, and the migration cost is low.

Although the second adjustment includes two key issues, the two issues restrict and affect each other. Therefore, the relationship between the migration cost and the defragmentation benefit (decreasing fragment rate) needs to be balanced, and unified modeling is required. At present, the feasibility and sequence of parallel migration of virtual machines are analyzed, and Pareto frontier is found by multi-objective optimization modeling.