Kami telah menyiapkan terjemahan dua bagian dari artikel Ryan Sears tentang penanganan log Transparansi Sertifikat Google . Bagian pertama memberikan gambaran umum tentang struktur log dan memberikan contoh kode Python untuk mengurai catatan dari log ini. Bagian kedua dikhususkan untuk mendapatkan semua sertifikat dari log yang tersedia dan menyiapkan sistem Google BigQuery untuk menyimpan dan mengatur penelusuran untuk data yang diterima.
Tiga tahun telah berlalu sejak aslinya ditulis, dan sejak itu jumlah log yang tersedia dan, karenanya, entri di dalamnya telah meningkat berkali-kali. Jauh lebih penting untuk mendekati pemrosesan log dengan benar jika tujuannya adalah untuk memaksimalkan jumlah data yang diterima.
Bagian 1. Mengurai Transparansi Sertifikat Log Seperti Bos
Selama pengembangan proyek pertama kami, phisfinder , saya menghabiskan banyak waktu memikirkan anatomi serangan phishing dan sumber data yang memungkinkan kami mengidentifikasi jejak kampanye phishing yang akan datang sebelum dapat menyebabkan kerusakan nyata.
Salah satu sumber yang telah kami integrasikan (dan pastinya salah satu yang terbaik) adalah Certificate Transparency Log (CTL), sebuah proyek yang dimulai oleh Ben Laurie dan Adam Langley di Google. Pada dasarnya, CTL adalah log yang berisi daftar sertifikat yang tidak dapat diubah yang dikeluarkan oleh CA yang disimpan di pohon Merkle, yang memungkinkan setiap sertifikat diverifikasi secara kriptografis jika perlu.
, , , CTL:
import requests
import json
import locale
locale.setlocale(locale.LC_ALL, 'en_US')
ctl_log = requests.get('https://www.gstatic.com/ct/log_list/log_list.json').json()
total_certs = 0
human_format = lambda x: locale.format('%d', x, grouping=True)
for log in ctl_log['logs']:
log_url = log['url']
try:
log_info = requests.get('https://{}/ct/v1/get-sth'.format(log_url), timeout=3).json()
total_certs += int(log_info['tree_size'])
except:
continue
print("{} has {} certificates".format(log_url, human_format(log_info['tree_size'])))
print("Total certs -> {}".format(human_format(total_certs)))
:
ct.googleapis.com/pilot has 92,224,404 certificates
ct.googleapis.com/aviator has 46,466,472 certificates
ct1.digicert-ct.com/log has 1,577,183 certificates
ct.googleapis.com/rocketeer has 89,391,361 certificates
ct.ws.symantec.com has 3,562,198 certificates
ctlog.api.venafi.com has 94,797 certificates
vega.ws.symantec.com has 200,401 certificates
ctserver.cnnic.cn has 5,081 certificates
ctlog.wosign.com has 1,387,492 certificates
ct.startssl.com has 293,374 certificates
ct.googleapis.com/skydiver has 1,249,079 certificates
ct.googleapis.com/icarus has 48,585,765 certificates
Total certs -> 285,037,607
285,037,607 . , , . .
CTL
CTL HTTP, . , , . :
json
// curl -s 'https://ct1.digicert-ct.com/log/ct/v1/get-entries?start=0&end=0' | jq .
{
"entries": [
{
"leaf_input": "AAAAAAFIyfaldAAAAAcDMIIG/zCCBeegAwIBAgI...",
"extra_data": "AAiJAAS6MIIEtjCCA56gAwIBAgIQDHmpRLCMEZU..."
}
]
}
`leaf_input` `extra_data` base64. RFC6962 , `leaf_input` - MerkleTreeLeaf, `extra_data` - PrecertChainEntry.
PreCerts
, , PreCert ( , RFC, , , . PreCerts :
PreCerts , CA , ββ . , , x509 v3, `poison` . , , , PreCert, , .
, , , x509/ASN.1 , PreCert. , , , PreCerts CTL , CA, .
, - CTF, . `struct`, , , Construct, . , , :
from construct import Struct, Byte, Int16ub, Int64ub, Enum, Bytes, Int24ub, this, GreedyBytes, GreedyRange, Terminated, Embedded
MerkleTreeHeader = Struct(
"Version" / Byte,
"MerkleLeafType" / Byte,
"Timestamp" / Int64ub,
"LogEntryType" / Enum(Int16ub, X509LogEntryType=0, PrecertLogEntryType=1),
"Entry" / GreedyBytes
)
Certificate = Struct(
"Length" / Int24ub,
"CertData" / Bytes(this.Length)
)
CertificateChain = Struct(
"ChainLength" / Int24ub,
"Chain" / GreedyRange(Certificate),
)
PreCertEntry = Struct(
"LeafCert" / Certificate,
Embedded(CertificateChain),
Terminated
)
import json
import base64
import ctl_parser_structures
from OpenSSL import crypto
entry = json.loads("""
{
"entries": [
{
"leaf_input": "AAAAAAFIyfaldAAAAAcDMIIG/zCCBeegAwIBAgIQ...",
"extra_data": "AAiJAAS6MIIEtjCCA56gAwIBAgIQDHmpRLCMEZUg..."
}
]
}
""")['entries'][0]
leaf_cert = ctl_parser_structures.MerkleTreeHeader.parse(base64.b64decode(entry['leaf_input']))
print("Leaf Timestamp: {}".format(leaf_cert.Timestamp))
print("Entry Type: {}".format(leaf_cert.LogEntryType))
if leaf_cert.LogEntryType == "X509LogEntryType":
# , - X509
cert_data_string = ctl_parser_structures.Certificate.parse(leaf_cert.Entry).CertData
chain = [crypto.load_certificate(crypto.FILETYPE_ASN1, cert_data_string)]
# `extra_data`
extra_data = ctl_parser_structures.CertificateChain.parse(base64.b64decode(entry['extra_data']))
for cert in extra_data.Chain:
chain.append(crypto.load_certificate(crypto.FILETYPE_ASN1, cert.CertData))
else:
# , - PreCert
extra_data = ctl_parser_structures.PreCertEntry.parse(base64.b64decode(entry['extra_data']))
chain = [crypto.load_certificate(crypto.FILETYPE_ASN1, extra_data.LeafCert.CertData)]
for cert in extra_data.Chain:
chain.append(
crypto.load_certificate(crypto.FILETYPE_ASN1, cert.CertData)
)
X509 leaf_input
, Construct Python.
, , CTL , - .
2. Retrieving, Storing and Querying 250M+ Certificates Like a Boss
RFC, `get-entries`. , , ( `start` `end`), 64 . CTL Google, , 1024 .
Google (Argon, Xenon, Aviator, Icarus, Pilot, Rocketeer, Skydiver) 32 , , , .
1024 , CTL, Google, 256 .
IO-bound ( http) CPU-bound ( ), , .
, CTL ( Google, , . Axeman, asyncio aioprocessing , CSV , -.
(_. ._ Google Cloud VM) c 16 , 32 SSD 750 ( Google 300$ !), Axeman, `/tmp/certificates/$CTL_DOMAIN/`
?
Postgres, , , Postgres 250 ( , 20 !), , :
, , (AWS RDS, Heroku Postgres, Google Cloud SQL) . , , .
, , map/reduce , , Spark Hadoop Pig. βbig dataβ ( ), Google BigQuery, .
BigQuery
BigQuery , Google gsutil. :
, `gsutil` Google ( BigQuery). `gsutil config`, :
gsutil -o GSUtil:parallel_composite_upload_threshold=150M \
-m cp \
/tmp/certificates/* \
gs://all-certificates
:
BigQuery:
. , BigQuery β, β, CTL , . ( ):
, βEdit as Textβ. :
[
{
"name": "url",
"type": "STRING",
"mode": "REQUIRED"
},
{
"mode": "REQUIRED",
"name": "cert_index",
"type": "INTEGER"
},
{
"mode": "REQUIRED",
"name": "chain_hash",
"type": "STRING"
},
{
"mode": "REQUIRED",
"name": "cert_der",
"type": "STRING"
},
{
"mode": "REQUIRED",
"name": "all_dns_names",
"type": "STRING"
},
{
"mode": "REQUIRED",
"name": "not_before",
"type": "FLOAT"
},
{
"mode": "REQUIRED",
"name": "not_after",
"type": "FLOAT"
}
]
. , ( , , ). :
.
, punycode . :
SQL
SELECT
all_dns_names
FROM
[ctl-lists:certificate_data.scan_data]
WHERE
(REGEXP_MATCH(all_dns_names,r'\b?xn\-\-'))
AND NOT all_dns_names CONTAINS 'cloudflare'
15 punycode CTL!
. Coinbase, Certificate Transparency:
SQL
SELECT
all_dns_names
FROM
[ctl-lists:certificate_data.scan_data]
WHERE
(REGEXP_MATCH(all_dns_names,r'.*\.coinbase.com[\s$]?'))
:
- , - .
, . `flowers-to-the-world.com` . , :
SQL
SELECT
url,
COUNT(*) AS total_certs
FROM
[ctl-lists:certificate_data.scan_data]
WHERE
(REGEXP_MATCH(all_dns_names,r'.*flowers-to-the-world.*'))
GROUP BY
url
ORDER BY
total_certs DESC
Whois , Google, , - . Google, - , Certificate Transparency, .
, . Certificate Transparency.
`flowers-to-the-world.com` Google. , CTL RFC6962. , .
, , , , , .
`flower-to-the-world.com`, , : βC=GB, ST=London, O=Google UK Ltd., OU=Certificate Transparency, CN=Merge Delay Monitor Rootβ
, .
β NetLas.io. , , , .
, , . , . , β , . Netlas.io " ". β .