Fitur ClickHouse untuk pengembang tingkat lanjut. Alexey Milovidov (2018)



Dalam laporan tersebut, direncanakan untuk mempertimbangkan fitur-fitur ClickHouse yang kurang dikenal atau tidak cukup terlindungi dalam dokumentasi: agregasi dan manipulasi tambahan dengan status fungsi agregat, penyalinan antar-cluster, menjalankan kueri tanpa menggunakan server, dll. Contoh dari praktik mengembangkan layanan Yandex akan diberikan: cara mendapatkan hasil maksimal dari sistem.







Halo! Nama saya Aleksey! Saya melakukan ClickHouse. Dan hari ini saya ingin memberi tahu Anda tentang beberapa peluang yang menurut saya menarik, tetapi tidak semua orang tahu.



Misalnya pengambilan sampel. Dan, dengan demikian, kunci pengambilan sampel. Ini adalah kemampuan untuk melakukan perkiraan kueri. Beberapa orang tahu. Sudah bagus.



ClickHouse, , . - clickstream. , , -. , , ..



, , . . - .





. , , pupkin.narod.ru; , yandex.ru. . ? .



. , . MergeTree, .





, . CREATE TABLE. ORDER BY – . , , . . , . . , , Hash . . . hash .





, . - . , . .



4,5 , . . 3,5 . . 4,5 .





1/10. from . 0,6 . , - , 2,5 . , , , . , – . , 10 . , .





, ?



  • -, . . , .
  • . , unix timestamp.
  • timestamp, ( , ). , , DevOps, . . , .
  • - . , .
  • , .
  • , url, hash . , , , . , url . - hash-.
  • - .
  • . , - .
  • , . , . - hash. , . ? . - , . , - , .
  • . – , . . .




, , :



  • -, , , hash-, . . , , .
  • . , . . . . . 1/10 hashes .
  • , .




:



  • – 1/10 .
  • – — - , , 1 000 000. . , , 1 000 000, . , , , , . . . 1 000 000 10 000 000 1 000 000 000? – _sample_factor. : x _sample_factor, , .
  • – SAMPLE OFFSET. . 1/10 , : « , , 1/10 ». : SAMPLE 1/10 OFFSET 1/10. .
  • . , , . , . , , . ? , , , , , overhead.




– . , , , count distinct, uniq. uniq 4 .



. . , If. , : sumIf. , . – , , – , - UInt8. , , . , , .





? – . ..



Google. .





– Array. , Array. , , . .



. . , .



groupArray, . , … . . . .



groupUniqArray. .



Array - , , .



, groupArrayArray? , , . .



groupUniqArrayArray – , .





, . , . , , , , , .





. . . sum, Array, If , , . . . sumArrayIf SumIfArray.



? . , . – , , – : . sumIfArray. . . Array sumIf .



– sumArrayIf. , – : sumArray .





. : sumForEachStateForEachIfArrayIfState. , , , . . , .





. , , . , . .



, , . . .



, - , .



, count distinct? Hash table. hash‑ .



, , . : « , , ». - .



, ClickHouse . –State, . , , AggregateFunction - .



, . . . AggtregateFunction. , , .



–Merge. , , .



, .







uniq . .





. –State. - , . - . , UTF-8. , , UTF-8.





? AggregateFunction . . – , – .





. , , . ?





– .



, ClickHouse – , – . clickstream, , , . , . ClickHouse , .



. - , , .



. AggregatingMergeTree. , . . , - . . , . . , , count distinct, .



, . ClickHouse, , .





. ?



  • – , . , . , . , ClickHouse-, . .
  • , , , , – . sum sumIf – , .
  • , *. , , , arrayReduce. , , , , . . –State, .


* 2020 , initializeAggregation.





ClickHouse – .



, ClickHouse , . . INSERT, , . . INSERT , . , .



conflict-free , update. INSERT. , .



mulit-master. . , . , , . . . exactly-once .



, . , INSERT, , , , .





. , INSERT.



. INSERT, SELECT, .



– INSERT. , , . insert_quorum = 2. .



, INSERT , , INSERT, .



SELECT , select_sequential_consistency. , . select_linearizability, .



, SELECT, ZooKeeper. SELECT , , . . INSERT, .



, , - , . , , .



, . , , , .





. .



, , GROUP BY, , , - . , .



. . . .





. progress-bar, . ClickHouse- – , . , , . , 9,31 , 10 .



, : « 10 ?». . .





, . , , . - 0,5 . , , .



- , 10 . , , . , - .





: « max_memory_usage »?





, . . , production .





– GROUP BY . ? , - . , . , . , . . . . . - . , .



, :



  • – , .
  • – merge, . . merge , .


. dataset - . . datasets - , buckets. .





. max_memory_usage 10 . 8 (max_bytes_before_external_group_by = 8 000 000 000, distributed_aggregation_memory_efficient = 1). progress-bar - . . ? , . , , .



. , , . , , . . . , .



, . . . , 10 GROUP BY, , 1 .



, , , .





. .



, ClickHouse – - geospatial-, .



– . . , , .



? , . , - , . , , .



? , .



pointInPolygon. – : lat, lon, . *. - .



* 2020 , .



, . , -, . , . . .



, pointInPolygon, . pointInElliplses, ellipses . , , , . , . . 0 1, ellipses.



– greatCircleDistance. *. . , .



* 2020 , geoDistance, WGS-84 .





https://events.yandex.ru/lib/talks/5330/



– ClickHouse . ClickHouse. , , - . ClickHouse. .



. . , . modelEvaluate.



, , .



, CTR . . , : https://events.yandex.ru/lib/talks/5330/. .



. CatBoost. CatBoost? .





, ?



  • - , . , - , . *. , . - , , . .
  • , , ClickHouse . . , – . , , . , . , ClickHouse. . .
  • , , -, . . .


* ClickHouse.





. ClickHouse ClickHouse-.



- . ClickHouse. - . .



? grep, sed, awk perl. , , - , ClickHouse , . ClickHouse - , . . clickhouse-local.



. – , . . . , ClickHouse - .



, . , , JSON, JSONEachRow. stdin . , , . , , - ClickHouse. , , . , awk, perl, sed. , grep, . . .



, , , , – ClickHouse-. . .





ClickHouse-, ClickHouse-local , . , ClickHouse- .





? , ?



Date DateTime. Date , , ISO 8601, DateTime , - , . , , , *.



, . parseDateTimeBestEffort. , . , . . , , , .



* , date_time_input_format.



, , Hadoop . ClickHouse-local MapReduce jobs Hadoop.



, Parquet. pull request. , *.



* , — !



, , trash SQL. , - regexp’ , clickhouse-local. , Awk, , ClickHouse*.



* — Regexp.





, , .



? ClickHouse . . . , , . , .



? . , INSERT SELECT . . , .



, . . , , , , , – , , . - . .



, ClickHouse-copier. , . Zookeeper. – , .



. . . , .



- , . - .





– production.



.. 538 . 240 . , . .



. , . . lz4, zstd . . , . ClickHouse-copier. , , . - .



, - . , , . , , , , .



, !







! . . real time , ?



. , .



. ., , , ? ? ?



. , , , . - . , . . . . , . . . . , . .



. . ClickHouse?



, ClickHouse. . . . , . , , . , , , .



! Avro, Parquet, MapReduce machine learning, - , ClickHouse - jobs YARN, Mesos? . . , Spark, , , date locality , ?



, , , . , Hadoop MapReduce YT. , ClickHouse YT, , . - , .



. . . YT - ? . . , ClickHouse, tool CSV job?



*. , - . ClickHouse-, - , ClickHouse, YT. .



* .



! ClickHouse-local - - - , ?



, standalone-. – ClickHouse. . . ClickHouse, , , local, , .



! . groupUniqArrayArray order , ?



, , , .



! ! ClickHouse. . Vertica. , . ? CSV. CSV ClickHouse. ClickHouse , . . , , , . , , . - ?



ClickHouse JOIN , ? *. ClickHouse hash JOIN, . . , hash- . . . , , - . , merge JOIN, , .



* , join_algorithm. , , .



! , , master-master , , , ? , , ?



– , pull request. . , . Master-master . -, ClickHouse ZooKeeper. . , , , - - .



! ! ClickHouse-copier , , , ?



ClickHouse-copier , operation-. . , , , . . . , . . , -, . . , ZooKeeper. – *.



* , .




All Articles