华为云用户手册

  • 操作步骤 执行以下SQL创建目标表(共24张表)。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 CREATE TABLE customer_address ( ca_address_sk bigint not null , ca_address_id char(16) not null, ca_street_number char(10) , ca_street_name varchar(60) , ca_street_type char(15) , ca_suite_number char(10) , ca_city varchar(60) , ca_county varchar(30) , ca_state char(2) , ca_zip char(10) , ca_country varchar(20) , ca_gmt_offset decimal(5,2) , ca_location_type char(20) ) with (orientation = column) distribute by hash (ca_address_sk); CREATE TABLE customer_demographics ( cd_demo_sk bigint not null , cd_gender char(1) , cd_marital_status char(1) , cd_education_status char(20) , cd_purchase_estimate bigint , cd_credit_rating char(10) , cd_dep_count bigint , cd_dep_employed_count bigint , cd_dep_college_count bigint ) with (orientation = column) distribute by hash (cd_demo_sk); CREATE TABLE date_dim ( d_date_sk bigint not null, d_date_id char(16) not null, d_date date , d_month_seq bigint , d_week_seq bigint , d_quarter_seq bigint , d_year bigint , d_dow bigint , d_moy bigint , d_dom bigint , d_qoy bigint , d_fy_year bigint , d_fy_quarter_seq bigint , d_fy_week_seq bigint , d_day_name char(9) , d_quarter_name char(6) , d_holiday char(1) , d_weekend char(1) , d_following_holiday char(1) , d_first_dom bigint , d_last_dom bigint , d_same_day_ly bigint , d_same_day_lq bigint , d_current_day char(1) , d_current_week char(1) , d_current_month char(1) , d_current_quarter char(1) , d_current_year char(1) ) with (orientation = column) DISTRIBUTE by hash(d_date_sk) PARTITION BY Range(d_year) ( partition p1 values less than(1950), partition p2 values less than(2000), partition p3 values less than(2050), partition p4 values less than(2100), partition p5 values less than(3000), partition p6 values less than(maxvalue) ); CREATE TABLE warehouse ( w_warehouse_sk bigint not null, w_warehouse_id char(16) not null, w_warehouse_name varchar(20) , w_warehouse_sq_ft bigint , w_street_number char(10) , w_street_name varchar(60) , w_street_type char(15) , w_suite_number char(10) , w_city varchar(60) , w_county varchar(30) , w_state char(2) , w_zip char(10) , w_country varchar(20) , w_gmt_offset decimal(5,2) ) with (orientation = column) distribute by replication; CREATE TABLE ship_mode ( sm_ship_mode_sk bigint not null, sm_ship_mode_id char(16) not null, sm_type char(30) , sm_code char(10) , sm_carrier char(20) , sm_contract char(20) ) with (orientation = column) distribute by replication; CREATE TABLE time_dim ( t_time_sk bigint not null, t_time_id char(16) not null, t_time bigint , t_hour bigint , t_minute bigint , t_second bigint , t_am_pm char(2) , t_shift char(20) , t_sub_shift char(20) , t_meal_time char(20) ) with (orientation = column) distribute by hash (t_time_sk); CREATE TABLE reason ( r_reason_sk bigint not null, r_reason_id char(16) not null, r_reason_desc char(100) ) with (orientation = column) distribute by replication; CREATE TABLE income_band ( ib_income_band_sk bigint not null, ib_lower_bound bigint , ib_upper_bound bigint ) with (orientation = column) distribute by replication; CREATE TABLE item ( i_item_sk bigint not null, i_item_id char(16) not null, i_rec_start_date date , i_rec_end_date date , i_item_desc varchar(200) , i_current_price decimal(7,2) , i_wholesale_cost decimal(7,2) , i_brand_id bigint , i_brand char(50) , i_class_id bigint , i_class char(50) , i_category_id bigint , i_category char(50) , i_manufact_id bigint , i_manufact char(50) , i_size char(20) , i_formulation char(20) , i_color char(20) , i_units char(10) , i_container char(10) , i_manager_id bigint , i_product_name char(50) ) with (orientation = column) distribute by hash (i_item_sk); CREATE TABLE store ( s_store_sk bigint not null, s_store_id char(16) not null, s_rec_start_date date , s_rec_end_date date , s_closed_date_sk bigint , s_store_name varchar(50) , s_number_employees bigint , s_floor_space bigint , s_hours char(20) , s_manager varchar(40) , s_market_id bigint , s_geography_class varchar(100) , s_market_desc varchar(100) , s_market_manager varchar(40) , s_division_id bigint , s_division_name varchar(50) , s_company_id bigint , s_company_name varchar(50) , s_street_number varchar(10) , s_street_name varchar(60) , s_street_type char(15) , s_suite_number char(10) , s_city varchar(60) , s_county varchar(30) , s_state char(2) , s_zip char(10) , s_country varchar(20) , s_gmt_offset decimal(5,2) , s_tax_precentage decimal(5,2) ) with (orientation = column) distribute by replication; CREATE TABLE call_center ( cc_call_center_sk bigint not null, cc_call_center_id char(16) not null, cc_rec_start_date date , cc_rec_end_date date , cc_closed_date_sk bigint , cc_open_date_sk bigint , cc_name varchar(50) , cc_class varchar(50) , cc_employees bigint , cc_sq_ft bigint , cc_hours char(20) , cc_manager varchar(40) , cc_mkt_id bigint , cc_mkt_class char(50) , cc_mkt_desc varchar(100) , cc_market_manager varchar(40) , cc_division bigint , cc_division_name varchar(50) , cc_company bigint , cc_company_name char(50) , cc_street_number char(10) , cc_street_name varchar(60) , cc_street_type char(15) , cc_suite_number char(10) , cc_city varchar(60) , cc_county varchar(30) , cc_state char(2) , cc_zip char(10) , cc_country varchar(20) , cc_gmt_offset decimal(5,2) , cc_tax_percentage decimal(5,2) ) with (orientation = column) distribute by replication; drop table if exists customer; CREATE TABLE customer ( c_customer_sk bigint not null, c_customer_id char(16) not null, c_current_cdemo_sk bigint , c_current_hdemo_sk bigint , c_current_addr_sk bigint , c_first_shipto_date_sk bigint , c_first_sales_date_sk bigint , c_salutation char(10) , c_first_name char(20) , c_last_name char(30) , c_preferred_cust_flag char(1) , c_birth_day bigint , c_birth_month bigint , c_birth_year bigint , c_birth_country varchar(20) , c_login char(13) , c_email_address char(50) , c_last_review_date_sk char(10) ) with (orientation = column) distribute by hash (c_customer_sk); CREATE TABLE web_site ( web_site_sk bigint not null, web_site_id char(16) not null, web_rec_start_date date , web_rec_end_date date , web_name varchar(50) , web_open_date_sk bigint , web_close_date_sk bigint , web_class varchar(50) , web_manager varchar(40) , web_mkt_id bigint , web_mkt_class varchar(50) , web_mkt_desc varchar(100) , web_market_manager varchar(40) , web_company_id bigint , web_company_name char(50) , web_street_number char(10) , web_street_name varchar(60) , web_street_type char(15) , web_suite_number char(10) , web_city varchar(60) , web_county varchar(30) , web_state char(2) , web_zip char(10) , web_country varchar(20) , web_gmt_offset decimal(5,2) , web_tax_percentage decimal(5,2) ) with (orientation = column) distribute by replication; CREATE TABLE household_demographics ( hd_demo_sk bigint not null, hd_income_band_sk bigint , hd_buy_potential char(15) , hd_dep_count bigint , hd_vehicle_count bigint ) with (orientation = column) distribute by hash (hd_demo_sk); CREATE TABLE web_page ( wp_web_page_sk bigint not null, wp_web_page_id char(16) not null, wp_rec_start_date date , wp_rec_end_date date , wp_creation_date_sk bigint , wp_access_date_sk bigint , wp_autogen_flag char(1) , wp_customer_sk bigint , wp_url varchar(100) , wp_type char(50) , wp_char_count bigint , wp_link_count bigint , wp_image_count bigint , wp_max_ad_count bigint ) with (orientation = column) distribute by replication; CREATE TABLE promotion ( p_promo_sk bigint not null, p_promo_id char(16) not null, p_start_date_sk bigint , p_end_date_sk bigint , p_item_sk bigint , p_cost decimal(15,2) , p_response_target bigint , p_promo_name char(50) , p_channel_dmail char(1) , p_channel_email char(1) , p_channel_catalog char(1) , p_channel_tv char(1) , p_channel_radio char(1) , p_channel_press char(1) , p_channel_event char(1) , p_channel_demo char(1) , p_channel_details varchar(100) , p_purpose char(15) , p_discount_active char(1) ) with (orientation = column) DISTRIBUTE BY HASH(p_promo_sk); CREATE TABLE catalog_page ( cp_catalog_page_sk bigint not null, cp_catalog_page_id char(16) not null, cp_start_date_sk bigint , cp_end_date_sk bigint , cp_department varchar(50) , cp_catalog_number bigint , cp_catalog_page_number bigint , cp_description varchar(100) , cp_type varchar(100) ) with (orientation = column) distribute by hash (cp_catalog_page_sk); CREATE TABLE inventory ( inv_date_sk bigint not null, inv_item_sk bigint not null, inv_warehouse_sk bigint not null, inv_quantity_on_hand integer ) with (orientation = column) distribute by hash (inv_item_sk) partition by range(inv_date_sk) ( partition p1 values less than(2451180), partition p2 values less than(2451545), partition p3 values less than(2451911), partition p4 values less than(2452276), partition p5 values less than(2452641), partition p6 values less than(2453006), partition p7 values less than(maxvalue) ) ; CREATE TABLE catalog_returns ( cr_returned_date_sk bigint , cr_returned_time_sk bigint , cr_item_sk bigint not null, cr_refunded_customer_sk bigint , cr_refunded_cdemo_sk bigint , cr_refunded_hdemo_sk bigint , cr_refunded_addr_sk bigint , cr_returning_customer_sk bigint , cr_returning_cdemo_sk bigint , cr_returning_hdemo_sk bigint , cr_returning_addr_sk bigint , cr_call_center_sk bigint , cr_catalog_page_sk bigint , cr_ship_mode_sk bigint , cr_warehouse_sk bigint , cr_reason_sk bigint , cr_order_number bigint not null, cr_return_quantity bigint , cr_return_amount decimal(7,2) , cr_return_tax decimal(7,2) , cr_return_amt_inc_tax decimal(7,2) , cr_fee decimal(7,2) , cr_return_ship_cost decimal(7,2) , cr_refunded_cash decimal(7,2) , cr_reversed_charge decimal(7,2) , cr_store_credit decimal(7,2) , cr_net_loss decimal(7,2) ) with (orientation = column) distribute by hash (cr_item_sk) partition by range(cr_returned_date_sk) ( partition p1 values less than(2450815), partition p2 values less than(2451180), partition p3 values less than(2451545), partition p4 values less than(2451911), partition p5 values less than(2452276), partition p6 values less than(2452641), partition p7 values less than(2453006), partition p8 values less than(maxvalue) ) ; CREATE TABLE web_returns ( wr_returned_date_sk bigint , wr_returned_time_sk bigint , wr_item_sk bigint not null, wr_refunded_customer_sk bigint , wr_refunded_cdemo_sk bigint , wr_refunded_hdemo_sk bigint , wr_refunded_addr_sk bigint , wr_returning_customer_sk bigint , wr_returning_cdemo_sk bigint , wr_returning_hdemo_sk bigint , wr_returning_addr_sk bigint , wr_web_page_sk bigint , wr_reason_sk bigint , wr_order_number bigint not null, wr_return_quantity bigint , wr_return_amt decimal(7,2) , wr_return_tax decimal(7,2) , wr_return_amt_inc_tax decimal(7,2) , wr_fee decimal(7,2) , wr_return_ship_cost decimal(7,2) , wr_refunded_cash decimal(7,2) , wr_reversed_charge decimal(7,2) , wr_account_credit decimal(7,2) , wr_net_loss decimal(7,2) ) with (orientation = column) distribute by hash (wr_item_sk) partition by range(wr_returned_date_sk) ( partition p1 values less than(2450815), partition p2 values less than(2451180), partition p3 values less than(2451545), partition p4 values less than(2451911), partition p5 values less than(2452276), partition p6 values less than(2452641), partition p7 values less than(2453006), partition p8 values less than(maxvalue) ) ; CREATE TABLE store_returns ( sr_returned_date_sk bigint , sr_return_time_sk bigint , sr_item_sk bigint not null, sr_customer_sk bigint , sr_cdemo_sk bigint , sr_hdemo_sk bigint , sr_addr_sk bigint , sr_store_sk bigint , sr_reason_sk bigint , sr_ticket_number bigint not null, sr_return_quantity bigint , sr_return_amt decimal(7,2) , sr_return_tax decimal(7,2) , sr_return_amt_inc_tax decimal(7,2) , sr_fee decimal(7,2) , sr_return_ship_cost decimal(7,2) , sr_refunded_cash decimal(7,2) , sr_reversed_charge decimal(7,2) , sr_store_credit decimal(7,2) , sr_net_loss decimal(7,2) ) with (orientation = column) distribute by hash (sr_item_sk) partition by range(sr_returned_date_sk) ( partition p1 values less than (2451180) , partition p2 values less than (2451545) , partition p3 values less than (2451911) , partition p4 values less than (2452276) , partition p5 values less than (2452641) , partition p6 values less than (2453006) , partition p7 values less than (maxvalue) ) ; CREATE TABLE web_sales ( ws_sold_date_sk bigint , ws_sold_time_sk bigint , ws_ship_date_sk bigint , ws_item_sk bigint not null, ws_bill_customer_sk bigint , ws_bill_cdemo_sk bigint , ws_bill_hdemo_sk bigint , ws_bill_addr_sk bigint , ws_ship_customer_sk bigint , ws_ship_cdemo_sk bigint , ws_ship_hdemo_sk bigint , ws_ship_addr_sk bigint , ws_web_page_sk bigint , ws_web_site_sk bigint , ws_ship_mode_sk bigint , ws_warehouse_sk bigint , ws_promo_sk bigint , ws_order_number bigint not null, ws_quantity bigint , ws_wholesale_cost decimal(7,2) , ws_list_price decimal(7,2) , ws_sales_price decimal(7,2) , ws_ext_discount_amt decimal(7,2) , ws_ext_sales_price decimal(7,2) , ws_ext_wholesale_cost decimal(7,2) , ws_ext_list_price decimal(7,2) , ws_ext_tax decimal(7,2) , ws_coupon_amt decimal(7,2) , ws_ext_ship_cost decimal(7,2) , ws_net_paid decimal(7,2) , ws_net_paid_inc_tax decimal(7,2) , ws_net_paid_inc_ship decimal(7,2) , ws_net_paid_inc_ship_tax decimal(7,2) , ws_net_profit decimal(7,2) ) with (orientation = column) distribute by hash (ws_item_sk) partition by range(ws_sold_date_sk) ( partition p1 values less than(2451180), partition p2 values less than(2451545), partition p3 values less than(2451911), partition p4 values less than(2452276), partition p5 values less than(2452641), partition p6 values less than(2453006), partition p7 values less than(maxvalue) ) ; CREATE TABLE catalog_sales ( cs_sold_date_sk bigint , cs_sold_time_sk bigint , cs_ship_date_sk bigint , cs_bill_customer_sk bigint , cs_bill_cdemo_sk bigint , cs_bill_hdemo_sk bigint , cs_bill_addr_sk bigint , cs_ship_customer_sk bigint , cs_ship_cdemo_sk bigint , cs_ship_hdemo_sk bigint , cs_ship_addr_sk bigint , cs_call_center_sk bigint , cs_catalog_page_sk bigint , cs_ship_mode_sk bigint , cs_warehouse_sk bigint , cs_item_sk bigint not null, cs_promo_sk bigint , cs_order_number bigint not null, cs_quantity bigint , cs_wholesale_cost decimal(7,2) , cs_list_price decimal(7,2) , cs_sales_price decimal(7,2) , cs_ext_discount_amt decimal(7,2) , cs_ext_sales_price decimal(7,2) , cs_ext_wholesale_cost decimal(7,2) , cs_ext_list_price decimal(7,2) , cs_ext_tax decimal(7,2) , cs_coupon_amt decimal(7,2) , cs_ext_ship_cost decimal(7,2) , cs_net_paid decimal(7,2) , cs_net_paid_inc_tax decimal(7,2) , cs_net_paid_inc_ship decimal(7,2) , cs_net_paid_inc_ship_tax decimal(7,2) , cs_net_profit decimal(7,2) ) with (orientation = column) distribute by hash (cs_item_sk) partition by range(cs_sold_date_sk) ( partition p1 values less than(2451180), partition p2 values less than(2451545), partition p3 values less than(2451911), partition p4 values less than(2452276), partition p5 values less than(2452641), partition p6 values less than(2453006), partition p7 values less than(maxvalue) ) ; CREATE TABLE store_sales ( ss_sold_date_sk bigint , ss_sold_time_sk bigint , ss_item_sk bigint not null, ss_customer_sk bigint , ss_cdemo_sk bigint , ss_hdemo_sk bigint , ss_addr_sk bigint , ss_store_sk bigint , ss_promo_sk bigint , ss_ticket_number bigint not null, ss_quantity bigint , ss_wholesale_cost decimal(7,2) , ss_list_price decimal(7,2) , ss_sales_price decimal(7,2) , ss_ext_discount_amt decimal(7,2) , ss_ext_sales_price decimal(7,2) , ss_ext_wholesale_cost decimal(7,2) , ss_ext_list_price decimal(7,2) , ss_ext_tax decimal(7,2) , ss_coupon_amt decimal(7,2) , ss_net_paid decimal(7,2) , ss_net_paid_inc_tax decimal(7,2) , ss_net_profit decimal(7,2) ) with (orientation = column) distribute by hash (ss_item_sk) partition by range(ss_sold_date_sk) ( partition p1 values less than(2451180), partition p2 values less than(2451545), partition p3 values less than(2451911), partition p4 values less than(2452276), partition p5 values less than(2452641), partition p6 values less than(2453006), partition p7 values less than(maxvalue) ) ; 执行以下SQL语句创建GDS外表(共24张表)。 以下每个外表的“gsfs://192.168.0.90:500x/xxx | gsfs://192.168.0.90:500x/xxx”中的IP地址和端口,请替换成安装和启动GDS中的对应的GDS的监听IP和端口,如启动两个GDS,则使用“|”区分。如果配置多个GDS服务器,需要将所有GDS的监听IP和端口配置到外表中。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 DROP FOREIGN TABLE IF EXISTS customer_address_ext; CREATE FOREIGN TABLE customer_address_ext ( ca_address_sk bigint , ca_address_id char(16) , ca_street_number char(10) , ca_street_name varchar(60) , ca_street_type char(15) , ca_suite_number char(10) , ca_city varchar(60) , ca_county varchar(30) , ca_state char(2) , ca_zip char(10) , ca_country varchar(20) , ca_gmt_offset decimal(5,2) , ca_location_type char(20) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/customer_address.dat* | gsfs://192.168.0.90:5003/customer_address.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with customer_address_err ; DROP FOREIGN TABLE IF EXISTS customer_demographics_ext; CREATE FOREIGN TABLE customer_demographics_ext ( cd_demo_sk bigint , cd_gender char(1) , cd_marital_status char(1) , cd_education_status char(20) , cd_purchase_estimate bigint , cd_credit_rating char(10) , cd_dep_count bigint , cd_dep_employed_count bigint , cd_dep_college_count bigint ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/customer_demographics.dat* | gsfs://192.168.0.90:5003/customer_demographics.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with customer_demographics_err ; DROP FOREIGN TABLE IF EXISTS date_dim_ext; CREATE FOREIGN TABLE date_dim_ext ( d_date_sk bigint , d_date_id char(16) , d_date date , d_month_seq bigint , d_week_seq bigint , d_quarter_seq bigint , d_year bigint , d_dow bigint , d_moy bigint , d_dom bigint , d_qoy bigint , d_fy_year bigint , d_fy_quarter_seq bigint , d_fy_week_seq bigint , d_day_name char(9) , d_quarter_name char(6) , d_holiday char(1) , d_weekend char(1) , d_following_holiday char(1) , d_first_dom bigint , d_last_dom bigint , d_same_day_ly bigint , d_same_day_lq bigint , d_current_day char(1) , d_current_week char(1) , d_current_month char(1) , d_current_quarter char(1) , d_current_year char(1) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/date_dim.dat* | gsfs://192.168.0.90:5003/date_dim.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with date_dim_err ; DROP FOREIGN TABLE IF EXISTS warehouse_ext; CREATE FOREIGN TABLE warehouse_ext ( w_warehouse_sk bigint , w_warehouse_id char(16) , w_warehouse_name varchar(20) , w_warehouse_sq_ft bigint , w_street_number char(10) , w_street_name varchar(60) , w_street_type char(15) , w_suite_number char(10) , w_city varchar(60) , w_county varchar(30) , w_state char(2) , w_zip char(10) , w_country varchar(20) , w_gmt_offset decimal(5,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/warehouse.dat* | gsfs://192.168.0.90:5003/warehouse.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with warehouse_err ; DROP FOREIGN TABLE IF EXISTS ship_mode_ext; CREATE FOREIGN TABLE ship_mode_ext ( sm_ship_mode_sk bigint , sm_ship_mode_id char(16) , sm_type char(30) , sm_code char(10) , sm_carrier char(20) , sm_contract char(20) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/ship_mode.dat* | gsfs://192.168.0.90:5003/ship_mode.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with ship_mode_err ; DROP FOREIGN TABLE IF EXISTS time_dim_ext; CREATE FOREIGN TABLE time_dim_ext ( t_time_sk bigint , t_time_id char(16) , t_time bigint , t_hour bigint , t_minute bigint , t_second bigint , t_am_pm char(2) , t_shift char(20) , t_sub_shift char(20) , t_meal_time char(20) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/time_dim.dat* | gsfs://192.168.0.90:5003/time_dim.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with time_dim_err ; DROP FOREIGN TABLE IF EXISTS reason_ext; CREATE FOREIGN TABLE reason_ext ( r_reason_sk bigint , r_reason_id char(16) , r_reason_desc char(100) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/reason.dat* | gsfs://192.168.0.90:5003/reason.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with reason_err ; DROP FOREIGN TABLE IF EXISTS income_band_ext; CREATE FOREIGN TABLE income_band_ext ( ib_income_band_sk bigint , ib_lower_bound bigint , ib_upper_bound bigint ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/income_band.dat* | gsfs://192.168.0.90:5003/income_band.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with income_band_err ; DROP FOREIGN TABLE IF EXISTS item_ext; CREATE FOREIGN TABLE item_ext ( i_item_sk bigint , i_item_id char(16) , i_rec_start_date date , i_rec_end_date date , i_item_desc varchar(200) , i_current_price decimal(7,2) , i_wholesale_cost decimal(7,2) , i_brand_id bigint , i_brand char(50) , i_class_id bigint , i_class char(50) , i_category_id bigint , i_category char(50) , i_manufact_id bigint , i_manufact char(50) , i_size char(20) , i_formulation char(20) , i_color char(20) , i_units char(10) , i_container char(10) , i_manager_id bigint , i_product_name char(50) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/item.dat* | gsfs://192.168.0.90:5003/item.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with item_err ; DROP FOREIGN TABLE IF EXISTS store_ext; CREATE FOREIGN TABLE store_ext ( s_store_sk bigint , s_store_id char(16) , s_rec_start_date date , s_rec_end_date date , s_closed_date_sk bigint , s_store_name varchar(50) , s_number_employees bigint , s_floor_space bigint , s_hours char(20) , s_manager varchar(40) , s_market_id bigint , s_geography_class varchar(100) , s_market_desc varchar(100) , s_market_manager varchar(40) , s_division_id bigint , s_division_name varchar(50) , s_company_id bigint , s_company_name varchar(50) , s_street_number varchar(10) , s_street_name varchar(60) , s_street_type char(15) , s_suite_number char(10) , s_city varchar(60) , s_county varchar(30) , s_state char(2) , s_zip char(10) , s_country varchar(20) , s_gmt_offset decimal(5,2) , s_tax_precentage decimal(5,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/store_[^rs]_* | gsfs://192.168.0.90:5003/store_[^rs]_*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with store_err ; DROP FOREIGN TABLE IF EXISTS call_center_ext; CREATE FOREIGN TABLE call_center_ext ( cc_call_center_sk bigint , cc_call_center_id char(16) , cc_rec_start_date date , cc_rec_end_date date , cc_closed_date_sk bigint , cc_open_date_sk bigint , cc_name varchar(50) , cc_class varchar(50) , cc_employees bigint , cc_sq_ft bigint , cc_hours char(20) , cc_manager varchar(40) , cc_mkt_id bigint , cc_mkt_class char(50) , cc_mkt_desc varchar(100) , cc_market_manager varchar(40) , cc_division bigint , cc_division_name varchar(50) , cc_company bigint , cc_company_name char(50) , cc_street_number char(10) , cc_street_name varchar(60) , cc_street_type char(15) , cc_suite_number char(10) , cc_city varchar(60) , cc_county varchar(30) , cc_state char(2) , cc_zip char(10) , cc_country varchar(20) , cc_gmt_offset decimal(5,2) , cc_tax_percentage decimal(5,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/call_center.dat* | gsfs://192.168.0.90:5003/call_center.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with call_center_err ; DROP FOREIGN TABLE IF EXISTS customer_ext; CREATE FOREIGN TABLE customer_ext ( c_customer_sk bigint , c_customer_id char(16) , c_current_cdemo_sk bigint , c_current_hdemo_sk bigint , c_current_addr_sk bigint , c_first_shipto_date_sk bigint , c_first_sales_date_sk bigint , c_salutation char(10) , c_first_name char(20) , c_last_name char(30) , c_preferred_cust_flag char(1) , c_birth_day bigint , c_birth_month bigint , c_birth_year bigint , c_birth_country varchar(20) , c_login char(13) , c_email_address char(50) , c_last_review_date_sk char(10) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/customer_[^ad]_* | gsfs://192.168.0.90:5003/customer_[^ad]_*', FORMAT 'TEXT' , DELIMITER '|', encoding 'GBK', mode 'Normal' ) with customer_err ; DROP FOREIGN TABLE IF EXISTS web_site_ext; CREATE FOREIGN TABLE web_site_ext ( web_site_sk bigint , web_site_id char(16) , web_rec_start_date date , web_rec_end_date date , web_name varchar(50) , web_open_date_sk bigint , web_close_date_sk bigint , web_class varchar(50) , web_manager varchar(40) , web_mkt_id bigint , web_mkt_class varchar(50) , web_mkt_desc varchar(100) , web_market_manager varchar(40) , web_company_id bigint , web_company_name char(50) , web_street_number char(10) , web_street_name varchar(60) , web_street_type char(15) , web_suite_number char(10) , web_city varchar(60) , web_county varchar(30) , web_state char(2) , web_zip char(10) , web_country varchar(20) , web_gmt_offset decimal(5,2) , web_tax_percentage decimal(5,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/web_site.dat* | gsfs://192.168.0.90:5003/web_site.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with web_site_err ; DROP FOREIGN TABLE IF EXISTS store_returns_ext; CREATE FOREIGN TABLE store_returns_ext ( sr_returned_date_sk bigint , sr_return_time_sk bigint , sr_item_sk bigint , sr_customer_sk bigint , sr_cdemo_sk bigint , sr_hdemo_sk bigint , sr_addr_sk bigint , sr_store_sk bigint , sr_reason_sk bigint , sr_ticket_number bigint , sr_return_quantity bigint , sr_return_amt decimal(7,2) , sr_return_tax decimal(7,2) , sr_return_amt_inc_tax decimal(7,2) , sr_fee decimal(7,2) , sr_return_ship_cost decimal(7,2) , sr_refunded_cash decimal(7,2) , sr_reversed_charge decimal(7,2) , sr_store_credit decimal(7,2) , sr_net_loss decimal(7,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/store_returns.dat* | gsfs://192.168.0.90:5003/store_returns.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with store_returns_err ; DROP FOREIGN TABLE IF EXISTS household_demographics_ext; CREATE FOREIGN TABLE household_demographics_ext ( hd_demo_sk bigint , hd_income_band_sk bigint , hd_buy_potential char(15) , hd_dep_count bigint , hd_vehicle_count bigint ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/household_demographics.dat* | gsfs://192.168.0.90:5003/household_demographics.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with household_demographics_err ; DROP FOREIGN TABLE IF EXISTS web_page_ext; CREATE FOREIGN TABLE web_page_ext ( wp_web_page_sk bigint , wp_web_page_id char(16) , wp_rec_start_date date , wp_rec_end_date date , wp_creation_date_sk bigint , wp_access_date_sk bigint , wp_autogen_flag char(1) , wp_customer_sk bigint , wp_url varchar(100) , wp_type char(50) , wp_char_count bigint , wp_link_count bigint , wp_image_count bigint , wp_max_ad_count bigint ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/web_page.dat* | gsfs://192.168.0.90:5003/web_page.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with web_page_err ; DROP FOREIGN TABLE IF EXISTS promotion_ext; CREATE FOREIGN TABLE promotion_ext ( p_promo_sk bigint , p_promo_id char(16) , p_start_date_sk bigint , p_end_date_sk bigint , p_item_sk bigint , p_cost decimal(15,2) , p_response_target bigint , p_promo_name char(50) , p_channel_dmail char(1) , p_channel_email char(1) , p_channel_catalog char(1) , p_channel_tv char(1) , p_channel_radio char(1) , p_channel_press char(1) , p_channel_event char(1) , p_channel_demo char(1) , p_channel_details varchar(100) , p_purpose char(15) , p_discount_active char(1) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/promotion.dat* | gsfs://192.168.0.90:5003/promotion.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with promotion_err ; DROP FOREIGN TABLE IF EXISTS catalog_page_ext; CREATE FOREIGN TABLE catalog_page_ext ( cp_catalog_page_sk bigint , cp_catalog_page_id char(16) , cp_start_date_sk bigint , cp_end_date_sk bigint , cp_department varchar(50) , cp_catalog_number bigint , cp_catalog_page_number bigint , cp_description varchar(100) , cp_type varchar(100) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/catalog_page.dat* | gsfs://192.168.0.90:5003/catalog_page.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with catalog_page_err ; DROP FOREIGN TABLE IF EXISTS inventory_ext; CREATE FOREIGN TABLE inventory_ext ( inv_date_sk bigint , inv_item_sk bigint , inv_warehouse_sk bigint , inv_quantity_on_hand integer ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/inventory.dat* | gsfs://192.168.0.90:5003/inventory.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with inventory_err ; DROP FOREIGN TABLE IF EXISTS catalog_returns_ext; CREATE FOREIGN TABLE catalog_returns_ext ( cr_returned_date_sk bigint , cr_returned_time_sk bigint , cr_item_sk bigint , cr_refunded_customer_sk bigint , cr_refunded_cdemo_sk bigint , cr_refunded_hdemo_sk bigint , cr_refunded_addr_sk bigint , cr_returning_customer_sk bigint , cr_returning_cdemo_sk bigint , cr_returning_hdemo_sk bigint , cr_returning_addr_sk bigint , cr_call_center_sk bigint , cr_catalog_page_sk bigint , cr_ship_mode_sk bigint , cr_warehouse_sk bigint , cr_reason_sk bigint , cr_order_number bigint , cr_return_quantity bigint , cr_return_amount decimal(7,2) , cr_return_tax decimal(7,2) , cr_return_amt_inc_tax decimal(7,2) , cr_fee decimal(7,2) , cr_return_ship_cost decimal(7,2) , cr_refunded_cash decimal(7,2) , cr_reversed_charge decimal(7,2) , cr_store_credit decimal(7,2) , cr_net_loss decimal(7,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/catalog_returns.dat* | gsfs://192.168.0.90:5003/catalog_returns.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with catalog_returns_err ; DROP FOREIGN TABLE IF EXISTS web_returns_ext; CREATE FOREIGN TABLE web_returns_ext ( wr_returned_date_sk bigint , wr_returned_time_sk bigint , wr_item_sk bigint , wr_refunded_customer_sk bigint , wr_refunded_cdemo_sk bigint , wr_refunded_hdemo_sk bigint , wr_refunded_addr_sk bigint , wr_returning_customer_sk bigint , wr_returning_cdemo_sk bigint , wr_returning_hdemo_sk bigint , wr_returning_addr_sk bigint , wr_web_page_sk bigint , wr_reason_sk bigint , wr_order_number bigint , wr_return_quantity bigint , wr_return_amt decimal(7,2) , wr_return_tax decimal(7,2) , wr_return_amt_inc_tax decimal(7,2) , wr_fee decimal(7,2) , wr_return_ship_cost decimal(7,2) , wr_refunded_cash decimal(7,2) , wr_reversed_charge decimal(7,2) , wr_account_credit decimal(7,2) , wr_net_loss decimal(7,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/web_returns.dat* | gsfs://192.168.0.90:5003/web_returns.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with web_returns_err ; DROP FOREIGN TABLE IF EXISTS web_sales_ext; CREATE FOREIGN TABLE web_sales_ext ( ws_sold_date_sk bigint , ws_sold_time_sk bigint , ws_ship_date_sk bigint , ws_item_sk bigint , ws_bill_customer_sk bigint , ws_bill_cdemo_sk bigint , ws_bill_hdemo_sk bigint , ws_bill_addr_sk bigint , ws_ship_customer_sk bigint , ws_ship_cdemo_sk bigint , ws_ship_hdemo_sk bigint , ws_ship_addr_sk bigint , ws_web_page_sk bigint , ws_web_site_sk bigint , ws_ship_mode_sk bigint , ws_warehouse_sk bigint , ws_promo_sk bigint , ws_order_number bigint , ws_quantity bigint , ws_wholesale_cost decimal(7,2) , ws_list_price decimal(7,2) , ws_sales_price decimal(7,2) , ws_ext_discount_amt decimal(7,2) , ws_ext_sales_price decimal(7,2) , ws_ext_wholesale_cost decimal(7,2) , ws_ext_list_price decimal(7,2) , ws_ext_tax decimal(7,2) , ws_coupon_amt decimal(7,2) , ws_ext_ship_cost decimal(7,2) , ws_net_paid decimal(7,2) , ws_net_paid_inc_tax decimal(7,2) , ws_net_paid_inc_ship decimal(7,2) , ws_net_paid_inc_ship_tax decimal(7,2) , ws_net_profit decimal(7,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/web_sales.dat* | gsfs://192.168.0.90:5003/web_sales.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with web_sales_err ; DROP FOREIGN TABLE IF EXISTS catalog_sales_ext; CREATE FOREIGN TABLE catalog_sales_ext ( cs_sold_date_sk bigint , cs_sold_time_sk bigint , cs_ship_date_sk bigint , cs_bill_customer_sk bigint , cs_bill_cdemo_sk bigint , cs_bill_hdemo_sk bigint , cs_bill_addr_sk bigint , cs_ship_customer_sk bigint , cs_ship_cdemo_sk bigint , cs_ship_hdemo_sk bigint , cs_ship_addr_sk bigint , cs_call_center_sk bigint , cs_catalog_page_sk bigint , cs_ship_mode_sk bigint , cs_warehouse_sk bigint , cs_item_sk bigint , cs_promo_sk bigint , cs_order_number bigint , cs_quantity bigint , cs_wholesale_cost decimal(7,2) , cs_list_price decimal(7,2) , cs_sales_price decimal(7,2) , cs_ext_discount_amt decimal(7,2) , cs_ext_sales_price decimal(7,2) , cs_ext_wholesale_cost decimal(7,2) , cs_ext_list_price decimal(7,2) , cs_ext_tax decimal(7,2) , cs_coupon_amt decimal(7,2) , cs_ext_ship_cost decimal(7,2) , cs_net_paid decimal(7,2) , cs_net_paid_inc_tax decimal(7,2) , cs_net_paid_inc_ship decimal(7,2) , cs_net_paid_inc_ship_tax decimal(7,2) , cs_net_profit decimal(7,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/catalog_sales.dat* | gsfs://192.168.0.90:5003/catalog_sales.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with catalog_sales_err ; DROP FOREIGN TABLE IF EXISTS store_sales_ext; CREATE FOREIGN TABLE store_sales_ext ( ss_sold_date_sk bigint , ss_sold_time_sk bigint , ss_item_sk bigint , ss_customer_sk bigint , ss_cdemo_sk bigint , ss_hdemo_sk bigint , ss_addr_sk bigint , ss_store_sk bigint , ss_promo_sk bigint , ss_ticket_number bigint , ss_quantity bigint , ss_wholesale_cost decimal(7,2) , ss_list_price decimal(7,2) , ss_sales_price decimal(7,2) , ss_ext_discount_amt decimal(7,2) , ss_ext_sales_price decimal(7,2) , ss_ext_wholesale_cost decimal(7,2) , ss_ext_list_price decimal(7,2) , ss_ext_tax decimal(7,2) , ss_coupon_amt decimal(7,2) , ss_net_paid decimal(7,2) , ss_net_paid_inc_tax decimal(7,2) , ss_net_profit decimal(7,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/store_sales.dat* | gsfs://192.168.0.90:5003/store_sales.dat*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal' ) with store_sales_err ; 执行以下SQL语句导入数据。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 INSERT INTO customer_address SELECT * FROM customer_address_ext; INSERT INTO customer_demographics SELECT * FROM customer_demographics_ext; INSERT INTO date_dim SELECT * FROM date_dim_ext; INSERT INTO warehouse SELECT * FROM warehouse_ext; INSERT INTO ship_mode SELECT * FROM ship_mode_ext; INSERT INTO time_dim SELECT * FROM time_dim_ext; INSERT INTO reason SELECT * FROM reason_ext; INSERT INTO income_band SELECT * FROM income_band_ext; INSERT INTO item SELECT * FROM item_ext; INSERT INTO store SELECT * FROM store_ext; INSERT INTO call_center SELECT * FROM call_center_ext; INSERT INTO customer SELECT * FROM customer_ext; INSERT INTO web_site SELECT * FROM web_site_ext; INSERT INTO household_demographics SELECT * FROM household_demographics_ext; INSERT INTO web_page SELECT * FROM web_page_ext; INSERT INTO promotion SELECT * FROM promotion_ext; INSERT INTO catalog_page SELECT * FROM catalog_page_ext; INSERT INTO inventory SELECT * FROM inventory_ext; INSERT INTO catalog_returns SELECT * FROM catalog_returns_ext; INSERT INTO web_returns SELECT * FROM web_returns_ext; INSERT INTO store_returns SELECT * FROM store_returns_ext; INSERT INTO web_sales SELECT * FROM web_sales_ext; INSERT INTO catalog_sales SELECT * FROM catalog_sales_ext; INSERT INTO store_sales SELECT * FROM store_sales_ext;
  • 表数据行数 表1 TPC-DS 序号 表名 行数 1 customer_address 6,000,000 2 customer_demographics 1,920,800 3 date_dim 73,049 4 warehouse 20 5 ship_mode 20 6 time_dim 86,400 7 reason 65 8 income_band 20 9 item 300,000 10 store 1,002 11 call_center 42 12 customer 12,000,000 13 web_site 54 14 household_demographics 7,200 15 web_page 3,000 16 promotion 1,500 17 catalog_page 30,000 18 inventory 783,000,000 19 catalog_returns 143,996,756 20 web_returns 71,997,522 21 store_returns 287,999,764 22 web_sales 720,000,376 23 catalog_sales 1,439,980,416 24 store_sales 2,879,987,999
  • SQL20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 select i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price ,sum(cs_ext_sales_price) as itemrevenue ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over (partition by i_class) as revenueratio from catalog_sales ,item ,date_dim where cs_item_sk = i_item_sk and i_category in ('Sports', 'Shoes', 'Women') and cs_sold_date_sk = d_date_sk and d_date between cast('2001-03-21' as date) and (cast('2001-03-21' as date) + 30) group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio limit 100;
  • SQL18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 select i_item_id, ca_country, ca_state, ca_county, avg( cast(cs_quantity as decimal(12,2))) agg1, avg( cast(cs_list_price as decimal(12,2))) agg2, avg( cast(cs_coupon_amt as decimal(12,2))) agg3, avg( cast(cs_sales_price as decimal(12,2))) agg4, avg( cast(cs_net_profit as decimal(12,2))) agg5, avg( cast(c_birth_year as decimal(12,2))) agg6, avg( cast(cd1.cd_dep_count as decimal(12,2))) agg7 from catalog_sales, customer_demographics cd1, customer_demographics cd2, customer, customer_address, date_dim, item where cs_sold_date_sk = d_date_sk and cs_item_sk = i_item_sk and cs_bill_cdemo_sk = cd1.cd_demo_sk and cs_bill_customer_sk = c_customer_sk and cd1.cd_gender = 'M' and cd1.cd_education_status = 'Primary' and c_current_cdemo_sk = cd2.cd_demo_sk and c_current_addr_sk = ca_address_sk and c_birth_month in (10,1,8,7,3,5) and d_year = 1998 and ca_state in ('NE','OK','NC' ,'CO','ID','AR','MO') group by rollup (i_item_id, ca_country, ca_state, ca_county) order by ca_country, ca_state, ca_county, i_item_id limit 100;
  • SQL17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 select i_item_id ,i_item_desc ,s_state ,count(ss_quantity) as store_sales_quantitycount ,avg(ss_quantity) as store_sales_quantityave ,stddev_samp(ss_quantity) as store_sales_quantitystdev ,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov ,count(sr_return_quantity) as store_returns_quantitycount ,avg(sr_return_quantity) as store_returns_quantityave ,stddev_samp(sr_return_quantity) as store_returns_quantitystdev ,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as store_returns_quantitycov ,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) as catalog_sales_quantityave ,stddev_samp(cs_quantity) as catalog_sales_quantitystdev ,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov from store_sales ,store_returns ,catalog_sales ,date_dim d1 ,date_dim d2 ,date_dim d3 ,store ,item where d1.d_quarter_name = '2000Q1' and d1.d_date_sk = ss_sold_date_sk and i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and ss_customer_sk = sr_customer_sk and ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number and sr_returned_date_sk = d2.d_date_sk and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3') and sr_customer_sk = cs_bill_customer_sk and sr_item_sk = cs_item_sk and cs_sold_date_sk = d3.d_date_sk and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3') group by i_item_id ,i_item_desc ,s_state order by i_item_id ,i_item_desc ,s_state limit 100;
  • SQL7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 select i_item_id, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 from store_sales, customer_demographics, date_dim, item, promotion where ss_sold_date_sk = d_date_sk and ss_item_sk = i_item_sk and ss_cdemo_sk = cd_demo_sk and ss_promo_sk = p_promo_sk and cd_gender = 'M' and cd_marital_status = 'U' and cd_education_status = 'College' and (p_channel_email = 'N' or p_channel_event = 'N') and d_year = 1999 group by i_item_id order by i_item_id limit 100;
  • SQL13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where s_store_sk = ss_store_sk and ss_sold_date_sk = d_date_sk and d_year = 2001 and((ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'U' and cd_education_status = '4 yr Degree' and ss_sales_price between 100.00 and 150.00 and hd_dep_count = 3 )or (ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'D' and cd_education_status = '2 yr Degree' and ss_sales_price between 50.00 and 100.00 and hd_dep_count = 1 ) or (ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'S' and cd_education_status = 'Advanced Degree' and ss_sales_price between 150.00 and 200.00 and hd_dep_count = 1 )) and((ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('IL', 'WI', 'TN') and ss_net_profit between 100 and 200 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('MO', 'OK', 'WA') and ss_net_profit between 150 and 300 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('NE', 'VA', 'GA') and ss_net_profit between 50 and 250 )) ;
  • SQL12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 select i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price ,sum(ws_ext_sales_price) as itemrevenue ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over (partition by i_class) as revenueratio from web_sales ,item ,date_dim where ws_item_sk = i_item_sk and i_category in ('Music', 'Shoes', 'Children') and ws_sold_date_sk = d_date_sk and d_date between cast('2000-05-14' as date) and (cast('2000-05-14' as date) + 30 ) group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio limit 100;
  • SQL10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 select cd_gender, cd_marital_status, cd_education_status, count(*) cnt1, cd_purchase_estimate, count(*) cnt2, cd_credit_rating, count(*) cnt3, cd_dep_count, count(*) cnt4, cd_dep_employed_count, count(*) cnt5, cd_dep_college_count, count(*) cnt6 from customer c,customer_address ca,customer_demographics where c.c_current_addr_sk = ca.ca_address_sk and ca_county in ('Clark County','Richardson County','Tom Green County','Sullivan County','Cass County') and cd_demo_sk = c.c_current_cdemo_sk and exists (select * from store_sales,date_dim where c.c_customer_sk = ss_customer_sk and ss_sold_date_sk = d_date_sk and d_year = 2000 and d_moy between 1 and 1+3) and (exists (select * from web_sales,date_dim where c.c_customer_sk = ws_bill_customer_sk and ws_sold_date_sk = d_date_sk and d_year = 2000 and d_moy between 1 ANd 1+3) or exists (select * from catalog_sales,date_dim where c.c_customer_sk = cs_ship_customer_sk and cs_sold_date_sk = d_date_sk and d_year = 2000 and d_moy between 1 and 1+3)) group by cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_count order by cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_count limit 100;
  • SQL3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 select dt.d_year ,item.i_brand_id brand_id ,item.i_brand brand ,sum(ss_ext_sales_price) sum_agg from date_dim dt ,store_sales ,item where dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and item.i_manufact_id = 125 and dt.d_moy=11 group by dt.d_year ,item.i_brand ,item.i_brand_id order by dt.d_year ,sum_agg desc ,brand_id limit 100;
  • SQL5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 with ssr as (select s_store_id, sum(sales_price) as sales, sum(profit) as profit, sum(return_amt) as returns, sum(net_loss) as profit_loss from ( select ss_store_sk as store_sk, ss_sold_date_sk as date_sk, ss_ext_sales_price as sales_price, ss_net_profit as profit, cast(0 as decimal(7,2)) as return_amt, cast(0 as decimal(7,2)) as net_loss from store_sales union all select sr_store_sk as store_sk, sr_returned_date_sk as date_sk, cast(0 as decimal(7,2)) as sales_price, cast(0 as decimal(7,2)) as profit, sr_return_amt as return_amt, sr_net_loss as net_loss from store_returns ) salesreturns, date_dim, store where date_sk = d_date_sk and d_date between cast('2002-08-05' as date) and (cast('2002-08-05' as date) + 14 ) and store_sk = s_store_sk group by s_store_id) , csr as (select cp_catalog_page_id, sum(sales_price) as sales, sum(profit) as profit, sum(return_amt) as returns, sum(net_loss) as profit_loss from ( select cs_catalog_page_sk as page_sk, cs_sold_date_sk as date_sk, cs_ext_sales_price as sales_price, cs_net_profit as profit, cast(0 as decimal(7,2)) as return_amt, cast(0 as decimal(7,2)) as net_loss from catalog_sales union all select cr_catalog_page_sk as page_sk, cr_returned_date_sk as date_sk, cast(0 as decimal(7,2)) as sales_price, cast(0 as decimal(7,2)) as profit, cr_return_amount as return_amt, cr_net_loss as net_loss from catalog_returns ) salesreturns, date_dim, catalog_page where date_sk = d_date_sk and d_date between cast('2002-08-05' as date) and (cast('2002-08-05' as date) + 14 ) and page_sk = cp_catalog_page_sk group by cp_catalog_page_id) , wsr as (select web_site_id, sum(sales_price) as sales, sum(profit) as profit, sum(return_amt) as returns, sum(net_loss) as profit_loss from ( select ws_web_site_sk as wsr_web_site_sk, ws_sold_date_sk as date_sk, ws_ext_sales_price as sales_price, ws_net_profit as profit, cast(0 as decimal(7,2)) as return_amt, cast(0 as decimal(7,2)) as net_loss from web_sales union all select ws_web_site_sk as wsr_web_site_sk, wr_returned_date_sk as date_sk, cast(0 as decimal(7,2)) as sales_price, cast(0 as decimal(7,2)) as profit, wr_return_amt as return_amt, wr_net_loss as net_loss from web_returns left outer join web_sales on ( wr_item_sk = ws_item_sk and wr_order_number = ws_order_number) ) salesreturns, date_dim, web_site where date_sk = d_date_sk and d_date between cast('2002-08-05' as date) and (cast('2002-08-05' as date) + 14 ) and wsr_web_site_sk = web_site_sk group by web_site_id) select channel , id , sum(sales) as sales , sum(returns) as returns , sum(profit) as profit from (select 'store channel' as channel , 'store' || s_store_id as id , sales , returns , (profit - profit_loss) as profit from ssr union all select 'catalog channel' as channel , 'catalog_page' || cp_catalog_page_id as id , sales , returns , (profit - profit_loss) as profit from csr union all select 'web channel' as channel , 'web_site' || web_site_id as id , sales , returns , (profit - profit_loss) as profit from wsr ) x group by rollup (channel, id) order by channel ,id limit 100;
  • 命令生成方法 TPC-DS标准99个SQL查询语句可用如下方法生成: 准备工作。生成TPC-DS查询语句前需要修改query_templates目录下的文件: 登录测试过程申请的E CS ,进入/data1/script/tpcds-kit/DSGen-software-code-3.2.0rc1/query_templates目录。 1 cd /data1/script/tpcds-kit/DSGen-software-code-3.2.0rc1/query_templates 新建文件hwdws.tpl,内容为: 1 2 3 4 5 define __LIMITA = ""; define __LIMITB = ""; define __LIMITC = "limit %d"; define _BEGIN = "-- begin query " + [_QUERY] + " in stream " + [_STREAM] + " using template " + [_TEMPLATE]; define _END = "-- end query " + [_QUERY] + " in stream " + [_STREAM] + " using template " + [_TEMPLATE]; 因TPC-DS工具中SQL语句生成模板有语法错误,需修改query77.tpl,将135行的‘, coalesce(returns, 0) returns’改为‘, coalesce(returns, 0) as returns’。 执行以下命令生成查询语句。 1 ./dsqgen -input ../query_templates/templates.lst -directory ../query_templates/ -scale 1000 -dialect hwdws 执行后会生成query_0.sql文件,里面放着99个标准SQL语句,需要手动去切分成99个文件。 生成的标准查询中如下日期函数语法在DWS暂不支持,需要手动进行修改: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 Q5: and (cast('2001-08-19' as date) + 14 days) 修改为 and (cast('2001-08-19' as date) + 14) Q12:and (cast('1999-02-28' as date) + 30 days) 修改为 and (cast('1999-02-28' as date) + 30) Q16:(cast('1999-4-01' as date) + 60 days) 修改为 (cast('1999-4-01' as date) + 60) Q20:and (cast('1998-05-05' as date) + 30 days) 修改为 and (cast('1998-05-05' as date) + 30) Q21:and d_date between (cast ('2000-05-19' as date) - 30 days) 修改为 and d_date between (cast ('2000-05-19' as date) - 30) and (cast ('2000-05-19' as date) + 30 days) 修改为 and (cast ('2000-05-19' as date) + 30) Q32:(cast('1999-02-22' as date) + 90 days) 修改为 (cast('1999-02-22' as date) + 90) Q37:and d_date between cast('1998-04-29' as date) and (cast('1998-04-29' as date) + 60 days) 修改为 and d_date between cast('1998-04-29' as date) and (cast('1998-04-29' as date) + 60) Q40:and d_date between (cast ('2002-05-10' as date) - 30 days) 修改为 and d_date between (cast ('2002-05-10' as date) - 30) and (cast ('2002-05-10' as date) + 30 days) 修改为 and (cast ('2002-05-10' as date) + 30) Q77:and (cast('1999-08-29' as date) + 30 days) 修改为 and (cast('1999-08-29' as date) + 30) Q80:and (cast('2002-08-04' as date) + 30 days) 修改为 and (cast('2002-08-04' as date) + 30) Q82:and d_date between cast('1998-01-18' as date) and (cast('1998-01-18' as date) + 60 days) 修改为 and d_date between cast('1998-01-18' as date) and (cast('1998-01-18' as date) + 60) Q92:(cast('2001-01-26' as date) + 90 days) 修改为 (cast('2001-01-26' as date) + 90) Q94:(cast('1999-5-01' as date) + 60 days) 修改为 (cast('1999-5-01' as date) + 60) Q95:(cast('1999-4-01' as date) + 60 days) 修改为 (cast('1999-4-01' as date) + 60) Q98:and (cast('2002-04-01' as date) + 30 days) 修改为 and (cast('2002-04-01' as date) + 30)
  • SQL2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 with wscs as (select sold_date_sk ,sales_price from (select ws_sold_date_sk sold_date_sk ,ws_ext_sales_price sales_price from web_sales union all select cs_sold_date_sk sold_date_sk ,cs_ext_sales_price sales_price from catalog_sales)), wswscs as (select d_week_seq, sum(case when (d_day_name='Sunday') then sales_price else null end) sun_sales, sum(case when (d_day_name='Monday') then sales_price else null end) mon_sales, sum(case when (d_day_name='Tuesday') then sales_price else null end) tue_sales, sum(case when (d_day_name='Wednesday') then sales_price else null end) wed_sales, sum(case when (d_day_name='Thursday') then sales_price else null end) thu_sales, sum(case when (d_day_name='Friday') then sales_price else null end) fri_sales, sum(case when (d_day_name='Saturday') then sales_price else null end) sat_sales from wscs ,date_dim where d_date_sk = sold_date_sk group by d_week_seq) select d_week_seq1 ,round(sun_sales1/sun_sales2,2) ,round(mon_sales1/mon_sales2,2) ,round(tue_sales1/tue_sales2,2) ,round(wed_sales1/wed_sales2,2) ,round(thu_sales1/thu_sales2,2) ,round(fri_sales1/fri_sales2,2) ,round(sat_sales1/sat_sales2,2) from (select wswscs.d_week_seq d_week_seq1 ,sun_sales sun_sales1 ,mon_sales mon_sales1 ,tue_sales tue_sales1 ,wed_sales wed_sales1 ,thu_sales thu_sales1 ,fri_sales fri_sales1 ,sat_sales sat_sales1 from wswscs,date_dim where date_dim.d_week_seq = wswscs.d_week_seq and d_year = 1999) y, (select wswscs.d_week_seq d_week_seq2 ,sun_sales sun_sales2 ,mon_sales mon_sales2 ,tue_sales tue_sales2 ,wed_sales wed_sales2 ,thu_sales thu_sales2 ,fri_sales fri_sales2 ,sat_sales sat_sales2 from wswscs ,date_dim where date_dim.d_week_seq = wswscs.d_week_seq and d_year = 1999+1) z where d_week_seq1=d_week_seq2-53 order by d_week_seq1;
  • 创建外部服务器 仅Hive对接OBS场景执行,Hive对接HDFS场景跳过。 使用Data Studio连接已创建好的DWS集群。 执行以下语句,创建外部服务器。{AK值}、{SK值}由准备环境获取。 1 2 3 4 5 6 7 8 9 CREATE SERVER obs_servevr FOREIGN DATA WRAPPER DFS_FDW OPTIONS ( address 'obs.xxx.com:5443', //OBS的访问地址。 encrypt 'on', access_key '{AK值}', secret_access_key '{SK值}', type 'obs' ); 查看外部服务器。 1 SELECT * FROM pg_foreign_server WHERE srvname='obs_server'; 返回结果如下所示,表示已经创建成功: 1 2 3 4 srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions --------------------------------------------------+----------+--------+---------+------------+--------+--------------------------------------------------------------------------------------------------------------------- obs_server | 16476 | 14337 | | | | {address=obs.xxx.com:5443,type=obs,encrypt=on,access_key=***,secret_access_key=***} (1 row)
  • 执行数据导出 创建本地源表。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 DROP TABLE IF EXISTS product_info_export; CREATE TABLE product_info_export ( product_price integer , product_id char(30) , product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt integer , product_comment_time date , product_comment_num integer , product_comment_content varchar(200) ) ; INSERT INTO product_info_export SELECT * FROM product_info; Hive端创建目标表。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 DROP TABLE product_info_orc_export; CREATE TABLE product_info_orc_export ( product_price int , product_id char(30) , product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt int , product_comment_time date , product_comment_num int , product_comment_content varchar(200) ) row format delimited fields terminated by ',' stored as orc; 从本地源表导入Hive表。 1 INSERT INTO ex1.product_info_orc_export SELECT * FROM product_info_export; Hive端查询导入结果 1 SELECT * FROM product_info_orc_export;
  • 执行数据导入 创建本地目标表。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 DROP TABLE IF EXISTS product_info; CREATE TABLE product_info ( product_price integer , product_id char(30) , product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt integer , product_comment_time date , product_comment_num integer , product_comment_content varchar(200) ) ; 从Hive表导入目标表。 1 INSERT INTO product_info SELECT * FROM ex1.product_info_orc; 查询导入结果。 1 SELECT * FROM product_info;
  • 基本流程 本实践预计时长:1小时,基本流程如下: 创建 MRS 分析集群(使用此特性必须选择Hive组件)。 在Hive端创建表。 在Hive端插入数据或者通过将本地txt数据文件上传至OBS桶,再通过OBS桶导入Hive,并由txt存储表导入ORC存储表。 创建MRS数据源连接。 创建外部服务器。 创建EXTERNAL SCHEMA。 通过EXTERNAL SCHEMA对Hive表进行导入或者读取操作。
  • 约束与限制 目前仅支持对接EXTERNAL SCHEMA对应的Hive端数据库的表进行SELECT、INSERT和INSERT OVERWRITE操作,其余操作均不支持。 MRS端两种数据源对应格式支持的操作参见表1。 表1 MRS端两种数据源支持的操作 数据源 表类型 操作 TEXT CSV PARQUET ORC HDFS 非分区表 SELECT √ √ √ √ INSERT/INSERT OVERWRITE x x x √ 分区表 SELECT √ √ √ √ INSERT/INSERT OVERWRITE x x x √ OBS 非分区表 SELECT √ √ √ √ INSERT/INSERT OVERWRITE x x x √ 分区表 SELECT x x √ √ INSERT/INSERT OVERWRITE x x x x 不再保证事务原子性,事务失败后,不再保证数据一致性;不支持回滚。 不支持通过EXTERNAL SCHEMA对hive端创建的表进行GRANT和REVOKE操作。 并发支持:DWS、HIVE、SPARK并发读写,会出现脏读问题;对同一张非分区表或者同一张分区表的同一个分区执行包含INSERT OVERWRITE相关的并发操作无法保证预期结果,请不要执行此类操作。 HiveMetaStore特性不支持联邦机制。
  • 创建MRS数据源连接 登录DWS管理控制台,单击已创建好的DWS集群,确保DWS集群与MRS在同一个区域、可用分区,并且在同一VPC子网下。 切换到“MRS数据源”,单击“创建MRS数据源连接”。 配置以下参数,单击“确认”。 数据源名称:mrs_server 配置方式:MRS用户 MRS数据源:选择前面创建的mrs_01集群。 MRS用户:admin 用户密码:前面创建MRS数据源的admin密码。
  • 注意事项 只有拥有表INSERT权限的用户,才可以向表中插入数据。 如果使用RETURNING子句,用户必须要有该表的SELECT权限。 如果使用QUERY子句插入来自查询里的数据行,用户还需要拥有在查询里使用的表的SELECT权限。 如果使用OVERWRITE子句覆盖式插入数据,用户还需要拥有该表的SELECT和TRUNCATE权限。 当连接到TD兼容的数据库时,td_compatible_truncation参数设置为on时,将启用超长字符串自动截断功能,在后续的INSERT语句中(不包含外表的场景下),对目标表中char和varchar类型的列上插入超长字符串时,系统会自动按照目标表中相应列定义的最大长度对超长字符串进行截断。 如果向字符集为字节类型编码(SQL_ASCII,LATIN1等)的数据库中插入多字节字符数据(如汉字等),且字符数据跨越截断位置,这种情况下,按照字节长度自动截断,自动截断后会在尾部产生非预期结果。如果用户有对于截断结果正确性的要求,建议用户采用UTF8等能够按照字符截断的输入字符集作为数据库的编码集。
  • 语法格式 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 [ WITH [ RECURSIVE ] with_query [, ...] ] INSERT [ IGNORE | OVERWRITE ] INTO table_name [ AS alias ] [ ( column_name [, ...] ) ] { DEFAULT VALUES | VALUES {( { expression | DEFAULT } [, ...] ) }[, ...] | query } [ ON DUPLICATE KEY duplicate_action | ON CONFLICT [ conflict_target ] conflict_action ] [ RETURNING {* | {output_expression [ [ AS ] output_name ] }[, ...]} ]; where duplicate_action can be: UPDATE { column_name = { expression | DEFAULT } | ( column_name [, ...] ) = ( { expression | DEFAULT } [, ...] ) } [, ...] and conflict_target can be one of: ( { index_column_name | ( index_expression ) } [ COLLATE collation ] [ opclass ] [, ...] ) [ WHERE index_predicate ] ON CONSTRAINT constraint_name and conflict_action is one of: DO NOTHING DO UPDATE SET { column_name = { expression | DEFAULT } | ( column_name [, ...] ) = ( { expression | DEFAULT } [, ...] ) } [, ...] [ WHERE condition ]
  • 参数说明 WITH [ RECURSIVE ] with_query [, ...] 用于声明一个或多个可以在主查询中通过名字引用的子查询,相当于临时表。 如果声明了RECURSIVE,那么允许SELECT子查询通过名字引用它自己。 其中with_query的详细格式为: with_query_name [ ( column_name [, ...] ) ] AS ( {select | values | insert | update | delete} ) – with_query_name指定子查询生成的结果集名字,在查询中可使用该名称访问子查询的结果集。 – column_name指定子查询结果集中显示的列名。 – 每个子查询可以是SELECT,VALUES,INSERT,UPDATE或DELETE语句。 IGNORE 用于主键或者唯一约束冲突时忽略冲突的数据。 详细介绍参见UPSERT。 OVERWRITE 用于标识覆盖式插入方式,使用此种插入方式执行结束后,目标原数据被清空,只存在新插入的数据。 OVERWRITE支持指定列插入的功能,其他列为默认值,若无默认值则为NULL。 OVERWRITE不要和INSERT INTO这类实时写入的操作并发,否则实时写入数据有被意外清理的风险。 OVERWRITE适用于大批量数据导入场景,不建议用于少量数据的插入场景。 避免对同一张表执行并发insert overwrite操作,否则会出现类似报错“tuple concurrently updated.”。 如果集群正在扩缩容,且INSERT OVERWRITE的写入表需要执行数据重分布,则INSERT OVERWRITE会清除当前数据,并自动将插入的数据按扩缩容后的节点来进行数据分布。如果INSERT OVERWRITE和该表的数据重分布过程同时执行,INSERT OVERWRITE会中断该表的数据重分布过程。 table_name 要插入数据的目标表名。 取值范围:已存在的表名。 AS 用于给目标表table_name指定别名。alias即为别名的名字。 column_name 目标表中的字段名: 字段名可以有子字段名或者数组下标修饰。 没有在字段列表中出现的每个字段,将由系统默认值,或者声明时的默认值填充,若都没有则用NULL填充。例如,向一个复合类型中的某些字段插入数据的话,其他字段将是NULL。 目标字段(column_name)可以按顺序排列。如果没有列出任何字段,则默认全部字段,且顺序为表声明时的顺序。 如果value子句和query中只提供了N个字段,则目标字段为前N个字段。 value子句和query提供的值在表中从左到右关联到对应列。 取值范围:已存在的字段名。 expression 赋予对应column的一个有效表达式或值: 向表中字段插入单引号 时需要使用单引号自身进行转义。 如果插入行的表达式不是正确的数据类型,系统试图进行类型转换,若转换不成功,则插入数据失败,系统返回错误信息。 示例: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 CREATE TABLE tt01 (id int,content varchar(50)); NOTICE: The 'DISTRIBUTE BY' clause is not specified. Using round-robin as the distribution mode by default. HINT: Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column. CREATE TABLE INSERT INTO tt01 values (1,'Jack say ''hello'''); INSERT 0 1 INSERT INTO tt01 values (2,'Rose do 50%'); INSERT 0 1 INSERT INTO tt01 values (3,'Lilei say ''world'''); INSERT 0 1 INSERT INTO tt01 values (4,'Hanmei do 100%'); INSERT 0 1 SELECT * FROM tt01; id | content ----+------------------- 3 | Lilei say 'world' 4 | Hanmei do 100% 1 | Jack say 'hello' 2 | Rose do 50% (4 rows) DEFAULT 对应字段名的缺省值。如果没有缺省值,则为NULL。 query 一个查询语句(SELECT语句),将查询结果作为插入的数据。 ON DUPLICATE KEY 用于主键或者唯一约束冲突时更新冲突的数据。 duplicate_action指定更新列和更新的数据。 详细介绍参见UPSERT。 ON CONFLICT 用于主键或者唯一约束冲突时忽略或者更新冲突的数据。 conflict_target用于指定列名index_column_name 、包含多个列名的表达式index_expression 或者约束名字constraint_name。作用是用于从列名、包含多个列名的表达式或者约束名推断是否有唯一索引。其中index_column_name和index_expression遵循CREATE INDEX的索引列格式。 conflict_action 指定主键或者唯一约束冲突时执行的策略。有两种: DO NOTHING冲突忽略。 DO UPDATE SET冲突更新。 后面指定更新列和更新的数据。 详细介绍参见UPSERT。 RETURNING 返回实际插入的行,RETURNING列表的语法与SELECT的输出列表一致。 output_expression INSERT命令在每一行都被插入之后用于计算输出结果的表达式。 取值范围:该表达式可以使用table的任意字段。可以使用*返回被插入行的所有字段。 output_name 字段的输出名称。 取值范围:字符串,符合标识符命名规范。
  • 参数说明 表1 冲突的锁模式 请求的锁模式/当前锁模式 AC CES S SHARE ROW SHARE ROW EXCLUSIVE SHARE UPDATE EXCLUSIVE SHARE SHARE ROW EXCLUSIVE EXCLUSIVE ACCESS EXCLUSIVE ACCESS SHARE - - - - - - - X ROW SHARE - - - - - - X X ROW EXCLUSIVE - - - - X X X X SHARE UPDATE EXCLUSIVE - - - X X X X X SHARE - - X X - X X X SHARE ROW EXCLUSIVE - - X X X X X X EXCLUSIVE - X X X X X X X ACCESS EXCLUSIVE X X X X X X X X LOCK的参数说明如下所示: name 要锁定的表的名字,可以有模式修饰。 LOCK TABLE命令中声明的表的顺序就是上锁的顺序。 取值范围:已存在的表名。 ONLY 如果指定ONLY只有该表被锁定,如果没有声明该表和他的所有子表将都被锁定。 ACCESS SHARE ACCESS锁只允许对表进行读取,而禁止对表进行修改。所有对表进行读取而不修改的SQL语句都会自动请求这种锁。例如,SELECT命令会自动在被引用的表上请求一个这种锁。 ROW SHARE ROW SHARE锁允许对表进行并发读取,禁止对表进行其他操作。 SELECT FOR UPDATE和SELECT FOR SHARE命令会自动在目标表上请求ROW SHARE锁(且所有被引用但不是FOR SHARE/FOR UPDATE的其他表上,还会自动加上ACCESS SHARE锁)。 ROW EXCLUSIVE 与ROW SHARE锁不同,ROW EXCLUSIVE允许并发读取表,也允许修改表中的数据。UPDATE,DELETE,INSERT命令会自动在目标表上请求这个锁(且所有被引用的其他表上还会自动加上的ACCESS SHARE锁)。通常情况下,所有会修改表数据的命令都会请求表的ROW EXCLUSIVE锁。 SHARE UPDATE EXCLUSIVE 这个模式保护一个表的模式不被并发修改,以及禁止在目标表上执行垃圾回收命令(VACUUM )。 VACUUM(不带FULL选项),ANALYZE,CREATE INDEX CONCURRENTLY命令会自动请求这样的锁。 SHARE SHARE锁允许并发的查询,但是禁止对表进行修改。 CREATE INDEX(不带CONCURRENTLY选项)语句会自动请求这种锁。 SHARE ROW EXCLUSIVE SHARE ROW EXCLUSIVE锁禁止对表进行任何的并发修改,而且是独占锁,因此一个会话中只能获取一次。 任何SQL语句都不会自动请求这个锁模式。 EXCLUSIVE EXCLUSIVE锁允许对目标表进行并发查询,但是禁止任何其他操作。 这个模式只允许并发加ACCESS SHARE锁,也就是说,只有对表的读动作可以和持有这个锁模式的事务并发执行。 任何SQL语句都不会在用户表上自动请求这个锁模式。然而在某些操作的时候,会在某些系统表上请求它。 ACCESS EXCLUSIVE 这个模式保证其所有者(事务)是可以访问该表的唯一事务。 ALTER TABLE,DROP TABLE,TRUNCATE,REINDEX,CLUSTER,VACUUM FULL命令会自动请求这种锁。 在LOCK TABLE命令没有明确声明需要的锁模式时,它是缺省锁模式。 NOWAIT 声明LOCK TABLE不去等待任何冲突的锁释放,如果无法立即获取该锁,该命令退出并且发出一个错误信息。 在不指定NOWAIT的情况下获取表级锁时,如果有其他互斥锁存在的话,则等待其他锁的释放。
  • 注意事项 LOCK TABLE只能在一个事务块的内部有用,因为锁在事务结束时就会被释放。出现在任意事务块外面的LOCK TABLE都会报错。 如果没有声明锁模式,缺省为最严格的模式ACCESS EXCLUSIVE。 LOCK TABLE ... IN ACCESS SHARE MODE需要在目标表上有SELECT权限。所有其他形式的LOCK需要UPDATE和/或DELETE权限。 没有UNLOCK TABLE命令,锁总是在事务结束时释放。 LOCK TABLE只处理表级的锁,因此那些带“ROW”字样的锁模式都是有歧义的。这些模式名字通常可理解为用户试图在一个被锁定的表中获取行级的锁。同样,ROW EXCLUSIVE模式也是一个可共享的表级锁。注意,只要是涉及到LOCK TABLE ,所有锁模式都有相同的语意,区别仅在于规则中锁与锁之间是否冲突,规则请参见表1。
  • 功能描述 LOCK TABLE获取表级锁。 当自动获取引用表的命令的锁时, GaussDB (DWS)会始终使用限制最小的锁模式。如果用户需要一种更为严格的锁模式,可以使用LOCK命令。例如,某个应用是在Read Committed隔离级别上运行事务,并且需要保证表中的数据在事务运行期间保持稳定。为实现这个目的,则可以在查询之前对表使用SHARE锁模式进行锁定。这样将防止并发数据更改,并确保后续的查询可以读到已提交的持久化的数据。因为SHARE锁模式与任何写操作需要的ROW EXCLUSIVE模式冲突,并且LOCK TABLE name IN SHARE MODE语句将等到所有当前持有ROW EXCLUSIVE模式锁的事务提交或回滚后才能执行。因此,一旦获得该锁,就不会存在未提交的写操作,此外其他操作也只能等到该锁释放之后才能开始。
  • 语法格式 1 2 3 LOCK [ TABLE ] {[ ONLY ] name [, ...]| {name [ * ]} [, ...]} [ IN {ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE | SHARE | SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE} MODE ] [ NOWAIT ];
  • 参数说明 expression 用于计算或插入结果表指定地点的常量或者表达式。 在一个出现在INSERT顶层的VALUES列表中,expression可以被DEFAULT替换以表示插入目的字段的缺省值。除此以外,当VALUES出现在其他场合的时候是不能使用DEFAULT的。 sort_expression 一个表示如何排序结果行的表达式或者整数常量。 ASC 指定按照升序排列。 DESC 指定按照降序排列。 operator 一个排序操作符。 count 返回的最大行数。 start 开始返回行之前忽略的行数。 FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY FETCH子句限定返回查询结果从第一行开始的总行数,count的缺省值为1。
  • 语法格式 1 2 3 4 VALUES {( expression [, ...] )} [, ...] [ ORDER BY { sort_expression [ ASC | DESC | USING operator ] } [, ...] ] [ { [ LIMIT { count | ALL } ] [ OFFSET start [ ROW | ROWS ] ] } | { LIMIT start, { count | ALL } } ] [ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ];
  • 查看部署资源并测试网络连接 登录华为云控制台,区域选择“北京四”。 图2 华为云控制台 在 虚拟私有云VPC 控制台,可查看该方案一键生成的VPC和对应的子网/路由表/弹性服务器ECS。 图3 虚拟私有云VPC控制台 图4 VPC实例 在弹性负载均衡中,可查看该方案一键部署生成的弹性负载均衡器。 图5 弹性负载均衡器实例 在弹性云服务器中,可查看该方案一键部署生成的弹性云服务器。 图6 弹性云服务器实例 在ECS上部署服务,并在对应的弹性负载均衡器的后端服务器组中查看详细信息。(本文以部署Nginx服务为例) 图7 后端服务器组详情 验证跨VPC添加后端服务器是否成功,使用浏览器访问步骤3中绑定了ELB的公网ip,显示如下页面,说明本次访问请求被ELB转发到跨VPC的后端服务器上。 图8 验证跨VPC添加后端服务器成功
  • 安全组规则修改(可选) 安全组实际是网络流量访问策略,包括网络流量入方向规则和出方向规则,通过这些规则为安全组内具有相同保护需求并且相互信任的云服务器、云容器、云数据库等实例提供安全保护。 如果您的实例关联的安全组策略无法满足使用需求,比如需要添加、修改、删除某个TCP端口,请参考以下内容进行修改。 添加安全组规则:根据业务使用需求需要开放某个TCP端口,请参考添加安全组规则添加入方向规则,打开指定的TCP端口。 修改安全组规则:安全组规则设置不当会造成严重的安全隐患。您可以参考修改安全组规则,来修改安全组中不合理的规则,保证云服务器等实例的网络安全。 删除安全组规则:当安全组规则入方向、出方向源地址/目的地址有变化时,或者不需要开放某个端口时,您可以参考删除安全组规则进行安全组规则删除。
共100000条