Dalam studi ini, saya ingin melihat peningkatan kinerja apa yang dapat diperoleh dengan menggunakan sumber data ClickHouse daripada PostgreSQL. Saya tahu manfaat kinerja apa yang saya dapatkan saat menggunakan ClickHouse. Apakah manfaat ini tetap ada jika saya mengakses ClickHouse dari PostgreSQL menggunakan pembungkus data eksternal (FDW)?
Lingkungan database yang dipelajari adalah PostgreSQL v11, clickhousedb_fdw, dan database ClickHouse. Akhirnya, dari PostgreSQL v11, kami akan menjalankan berbagai kueri SQL yang dirutekan melalui clickhousedb_fdw ke database ClickHouse. Kemudian kita akan melihat bagaimana kinerja FDW dibandingkan dengan kueri yang sama yang dieksekusi di PostgreSQL asli dan ClickHouse asli.
Database Clickhouse
ClickHouse adalah sistem manajemen basis data berbasis kolom sumber terbuka yang dapat mencapai kinerja 100-1000 kali lebih cepat daripada pendekatan basis data tradisional, yang mampu memproses lebih dari satu miliar baris dalam waktu kurang dari satu detik.
Clickhousedb_fdw
clickhousedb_fdw - ClickHouse External Database Wrapper, atau FDW, adalah proyek open source dari Percona. Berikut ini tautan ke repositori GitHub proyek .
Pada bulan Maret, saya menulis blog yang memberi tahu Anda lebih banyak tentang PLRT Asing kami .
Seperti yang akan Anda lihat, ini menyediakan FDW untuk ClickHouse, yang memungkinkan SELECT from, dan INSERT INTO, database ClickHouse dari server PostgreSQL v11.
FDW , aggregate join. .
Benchmark environment
- Supermicro server:
- Intel® Xeon® CPU E5-2683 v3 @ 2.00GHz
- 2 sockets / 28 cores / 56 threads
- Memory: 256GB of RAM
- Storage: Samsung SM863 1.9TB Enterprise SSD
- Filesystem: ext4/xfs
- OS: Linux smblade01 4.15.0-42-generic #45~16.04.1-Ubuntu
- PostgreSQL: version 11
Benchmark tests
, - , , , « , » 1987 2018 . , .
85 , 109 .
Benchmark Queries
, ClickHouse, clickhousedb_fdw PostgreSQL.
Q# | Query Contains Aggregates and Group By |
---|---|
Q1 | SELECT DayOfWeek, count(*) AS c FROM ontime WHERE Year >= 2000 AND Year <= 2008 GROUP BY DayOfWeek ORDER BY c DESC; |
Q2 | SELECT DayOfWeek, count(*) AS c FROM ontime WHERE DepDelay>10 AND Year >= 2000 AND Year <= 2008 GROUP BY DayOfWeek ORDER BY c DESC; |
Q3 | SELECT Origin, count(*) AS c FROM ontime WHERE DepDelay>10 AND Year >= 2000 AND Year <= 2008 GROUP BY Origin ORDER BY c DESC LIMIT 10; |
Q4 | SELECT Carrier, count() FROM ontime WHERE DepDelay>10 AND Year = 2007 GROUP BY Carrier ORDER BY count() DESC; |
Q5 | SELECT a.Carrier, c, c2, c1000/c2 as c3 FROM ( SELECT Carrier, count() AS c FROM ontime WHERE DepDelay>10 AND Year=2007 GROUP BY Carrier ) a INNER JOIN ( SELECT Carrier,count(*) AS c2 FROM ontime WHERE Year=2007 GROUP BY Carrier)b on a.Carrier=b.Carrier ORDER BY c3 DESC; |
Q6 | SELECT a.Carrier, c, c2, c1000/c2 as c3 FROM ( SELECT Carrier, count() AS c FROM ontime WHERE DepDelay>10 AND Year >= 2000 AND Year <= 2008 GROUP BY Carrier) a INNER JOIN ( SELECT Carrier, count(*) AS c2 FROM ontime WHERE Year >= 2000 AND Year <= 2008 GROUP BY Carrier ) b on a.Carrier=b.Carrier ORDER BY c3 DESC; |
Q7 | SELECT Carrier, avg(DepDelay) * 1000 AS c3 FROM ontime WHERE Year >= 2000 AND Year <= 2008 GROUP BY Carrier; |
Q8 | SELECT Year, avg(DepDelay) FROM ontime GROUP BY Year; |
Q9 | select Year, count(*) as c1 from ontime group by Year; |
Q10 | SELECT avg(cnt) FROM (SELECT Year,Month,count(*) AS cnt FROM ontime WHERE DepDel15=1 GROUP BY Year,Month) a; |
Q11 | select avg(c1) from (select Year,Month,count(*) as c1 from ontime group by Year,Month) a; |
Q12 | SELECT OriginCityName, DestCityName, count(*) AS c FROM ontime GROUP BY OriginCityName, DestCityName ORDER BY c DESC LIMIT 10; |
Q13 | SELECT OriginCityName, count(*) AS c FROM ontime GROUP BY OriginCityName ORDER BY c DESC LIMIT 10; |
Query Contains Joins | |
Q14 | SELECT a.Year, c1/c2 FROM ( select Year, count()1000 as c1 from ontime WHERE DepDelay>10 GROUP BY Year) a INNER JOIN (select Year, count(*) as c2 from ontime GROUP BY Year ) b on a.Year=b.Year ORDER BY a.Year; |
Q15 | SELECT a.”Year”, c1/c2 FROM ( select “Year”, count()1000 as c1 FROM fontime WHERE “DepDelay”>10 GROUP BY “Year”) a INNER JOIN (select “Year”, count(*) as c2 FROM fontime GROUP BY “Year” ) b on a.”Year”=b.”Year”; |
Table-1: Queries used in benchmark
Query executions
: PostgreSQL , ClickHouse clickhousedb_fdw. .
Q# | PostgreSQL | PostgreSQL (Indexed) | ClickHouse | clickhousedb_fdw |
---|---|---|---|---|
Q1 | 27920 | 19634 | 23 | 57 |
Q2 | 35124 | 17301 | 50 | 80 |
Q3 | 34046 | 15618 | 67 | 115 |
Q4 | 31632 | 7667 | 25 | 37 |
Q5 | 47220 | 8976 | 27 | 60 |
Q6 | 58233 | 24368 | 55 | 153 |
Q7 | 30566 | 13256 | 52 | 91 |
Q8 | 38309 | 60511 | 112 | 179 |
Q9 | 20674 | 37979 | 31 | 81 |
Q10 | 34990 | 20102 | 56 | 148 |
Q11 | 30489 | 51658 | 37 | 155 |
Q12 | 39357 | 33742 | 186 | 1333 |
Q13 | 29912 | 30709 | 101 | 384 |
Q14 | 54126 | 39913 | 124 | 1364212 |
Q15 | 97258 | 30211 | 245 | 259 |
Table-1: Time taken to execute the queries used in benchmark
, X , Y . ClickHouse , postgres clickhousedb_fdw, . , PostgreSQL ClickHouse, ClickHouse clickhousedb_fdw.
ClickhouseDB clickhousedb_fdw. FDW , Q12. ORDER BY. - ORDER BY GROUP/BY ORDER BY ClickHouse.
2 Q12 Q13. , ORDER BY. , Q-14 Q-15 ORDER BY . ORDER BY 259 , ORDER BY — 1364212. , .
Q15: Without ORDER BY Clause
bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2
FROM (SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a
INNER JOIN(SELECT "Year", count(*) AS c2 FROM fontime GROUP BY "Year") b ON a."Year"=b."Year";
Q15: Query Without ORDER BY Clause
QUERY PLAN
Hash Join (cost=2250.00..128516.06 rows=50000000 width=12)
Output: fontime."Year", (((count(*) * 1000)) / b.c2)
Inner Unique: true Hash Cond: (fontime."Year" = b."Year")
-> Foreign Scan (cost=1.00..-1.00 rows=100000 width=12)
Output: fontime."Year", ((count(*) * 1000))
Relations: Aggregate on (fontime)
Remote SQL: SELECT "Year", (count(*) * 1000) FROM "default".ontime WHERE (("DepDelay" > 10)) GROUP BY "Year"
-> Hash (cost=999.00..999.00 rows=100000 width=12)
Output: b.c2, b."Year"
-> Subquery Scan on b (cost=1.00..999.00 rows=100000 width=12)
Output: b.c2, b."Year"
-> Foreign Scan (cost=1.00..-1.00 rows=100000 width=12)
Output: fontime_1."Year", (count(*))
Relations: Aggregate on (fontime)
Remote SQL: SELECT "Year", count(*) FROM "default".ontime GROUP BY "Year"(16 rows)
Q14: Query With ORDER BY Clause
bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 FROM(SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a
INNER JOIN(SELECT "Year", count(*) as c2 FROM fontime GROUP BY "Year") b ON a."Year"= b."Year"
ORDER BY a."Year";
Q14: Query Plan with ORDER BY Clause
QUERY PLAN
Merge Join (cost=2.00..628498.02 rows=50000000 width=12)
Output: fontime."Year", (((count(*) * 1000)) / (count(*)))
Inner Unique: true Merge Cond: (fontime."Year" = fontime_1."Year")
-> GroupAggregate (cost=1.00..499.01 rows=1 width=12)
Output: fontime."Year", (count(*) * 1000)
Group Key: fontime."Year"
-> Foreign Scan on public.fontime (cost=1.00..-1.00 rows=100000 width=4)
Remote SQL: SELECT "Year" FROM "default".ontime WHERE (("DepDelay" > 10))
ORDER BY "Year" ASC
-> GroupAggregate (cost=1.00..499.01 rows=1 width=12)
Output: fontime_1."Year", count(*) Group Key: fontime_1."Year"
-> Foreign Scan on public.fontime fontime_1 (cost=1.00..-1.00 rows=100000 width=4)
Remote SQL: SELECT "Year" FROM "default".ontime ORDER BY "Year" ASC(16 rows)
, ClickHouse , clickhousedb_fdw ClickHouse PostgreSQL. clickhousedb_fdw , , ClickHouse. , fdw PostgreSQL .
Clickhouse https://t.me/clickhouse_ru
PostgreSQL https://t.me/pgsql