Cara mengatur Snowplow untuk analitik seluler

Menggunakan bajak salju akan mengurangi biaya analitik Anda. Ini adalah artikel pertama, dengan instruksi mendetail tentang cara mengatur seluruh proses transfer acara dari aplikasi seluler ke database RedShift. Pada artikel selanjutnya, kami akan menganalisis secara detail cara menyusun dasbor untuk melihat data yang dikumpulkan.





Menggunakan bajak salju akan mengurangi biaya analitik Anda. Ini adalah artikel pertama, dengan instruksi mendetail tentang cara mengatur seluruh proses transfer acara dari aplikasi seluler ke database RedShift. Pada artikel selanjutnya, kami akan menganalisis secara detail cara menyusun dasbor untuk melihat data yang dikumpulkan.



Isi artikel dari The Startup

Founder's Guide to Analytics
Tristan Handy sangat bagus sebagai pengantar artikel , ada terjemahannya di Habré https://habr.com/ru/post/346326/



Penulis menyarankan menggunakan alat Snowplow untuk analitik :



“Bermigrasi dari analitik yang ada dan sistem pelacakan peristiwa ke Snowplow Analytics. Snowplow melakukan

semua yang

dilakukan alat berbayar, tetapi ini open source. Anda dapat menghostingnya sendiri (dan

hanya membayar biaya instans EC2 Anda), atau membayar untuk menjadi host pengumpul acara di Snowplow atau

Fivetran. Jika Anda tidak melakukan lompatan pada saat ini, Anda tidak akan dapat mengumpulkan data yang lebih mendetail, dan

bersiap untuk beberapa akun Segmen, Heap, atau Mixpanel yang sangat besar dalam waktu dekat. Setelah Anda melewati

tahap ini , alat berbayar dapat dengan mudah menagih Anda $ 10.000 sebulan. "



, . Simo Ahava snowplow,

, snowplow



snowplow

, 2 .





:





?



, :



gambar5

:



  • , Clojure Collector.
  • - AWS Elastic Beanstalk,

    AWS Route 53.

  • AWS S3.
  • , ETL (extract,

    transform, load), AWS EMR,

    S3.

  • AWS Redshift.


, .



0: AWS IAM-





  • AWS

    . ,

    .



AMR



, , Amazon Web Services IAM (Identity and Access

Management) , .



, https://aws.amazon.com/

.



image12



, .



(IAM)



, , :

IAM.



IAM Snowplow
.



Services IAM .



gambar39

  • «Groups».
  • «Create new group» .


image24

  • snowplow-setup «Next step».
  • «Attach Policy», «Next step».
  • «Create Group».


«Policy».



  • «Create Policy».


gambar17

  • JSON :


  {
    "Version": "2012-10-17",
    "Statement": [
      { "Effect":
        "Allow",
        "Action": [
          "acm:*",
          "autoscaling:*",
          "aws-marketplace:ViewSubscriptions",
          "aws-marketplace:Subscribe",
          "aws-marketplace:Unsubscribe",
          "cloudformation:*",
          "cloudfront:*",
          "cloudwatch:*",
          "ec2:*",
          "elasticbeanstalk:*",
          "elasticloadbalancing:*",
          "elasticmapreduce:*",
          "es:*",
          "iam:*",
          "rds:*",
          "redshift:*",
          "s3:*",
          "sns:*"
        ],
        "Resource": "*"
      }
    ]
  }


gambar49

  • «Review policy».
  • snowplow-setup-policy-infrastructure.
  • «Create Policy».


«Groups» «snowplow-setup», .



gambar37

  • Permissions «Attach Policy».
  • Snowplow-setup-policy-Infrastructure «Attach Policy».


gambar19

image20

«Users» «Add user».



  • snowplow-setup.
  • «Programmatic access».
  • «Next: Permissions».
  • «Add user to group», snowplow-setup, «Next: Tags»
  • «Next: Tags»
  • «Create user».


image58

, , – . CSV, «Download .csv».



gambar61

, , . , , .



, 0 !





  • AWS.
  • IAM- snowplow-setup .


1: Clojure collector





  • DNS




Clojure Collector — , web-endpoint, . -, Apache Tomcat, AWS Elastic Beanstalk. Clojure Collector Tomcat AWS S3, , Clojure Collector, .



Clojure Collector



, , WAR Clojure Collector.



. clojure-collector-1.X.X-standalone.war.



, Elastic Beanstalk.



AWS Services Elastic Beanstalk.



gambar26

, AWS, Snowplow, . , . .



gambar9

Elastic Beanstalk



  • «Create Application».
  • (, Snowplow Clojure Collector).
  • Platform Tomcat, Tomcat 8.5 with Java 8 running on 64bit Amazon Linux
  • Application Code «Upload your code» WAR-.


gambar31

gambar46

  • «Create application»
  • ,


image50



gambar28

Clojure Collector , . , Applications cookie sp. , .



gambar6

! Clojure Collector.



.



S3



Tomcat S3 – . -, HTTP-, , .



S3, Elastic Beanstalk. Elastic Beanstalk AWS.



  • .
  • «Edit» «Software Configuration».


gambar3

  • «S3 log storage» «Rotate logs».


gambar1

, , S3 ETL.



«Apply», .





, Elastic Beanstalk - auto-scalable, .



  • «Configuration» .
  • «Capacity» «Edit».
  • «Environment Type» , «Load balanced», , .


gambar54

, .



Elastic Beanstalk SSL



.



  • Services AWS «Route 53» .
  • «Create hosted zone».
  • Domain Name , . snowplow.denjoy.ru. «Public Hosted Zone» «Create hosted zone».


gambar21

  • . NS. .


gambar33

  • , NS , cloudflare.

  • 4 NS- . CloudFlare:


gambar47

, NS- snowplow.denjoy.ru, NS AWS. .



-, , https://dnschecker.org/.



, , Route 53, . , Route 53 Elastic Beanstalk. , URL- snowplow.denjoy.ru , DNS AWS, - Clojure Collector. !



  • , «Create Record».


image32

  • «Simple Routing»
  • «Define simple record»
  • Di jendela yang terbuka, biarkan kolom Nama rekam kosong, di kolom Value / Route traffic to, pilih "Alias ​​to Elastic Beanstalk environment", di kolom berikutnya, pilih wilayah, di kolom Record type, pilih "A-records" dan klik tombol "Define simple record" di pojok bawah jendela


<img src = " denjoy.storage.yandexcloud.net/snowplow1/image7.png " alt = "image7"

  • Setelah menutup jendela, klik tombol "Buat catatan"


Sekarang, jika Anda membuka di browser http://snowplow.denjoy.ru/i, Anda akan melihat piksel yang sama seperti saat Anda membuka halaman Clojure Collector. Jadi, perutean domain berfungsi!



Tapi kami masih belum selesai.



Menyiapkan HTTPS untuk Clojure Collector



() SSL- AWS Load Balancer. , Route 53, . SSL



  • Services AWS Certificate Manager. «Provision certificates» «Get started»
  • «Request a public certificate»
  • , . snowplow.denjoy.ru «Next»
  • «DNS validation»
  • Tags
  • «Review» «Confirm and request»
  • . , AWS , «Create record in Route 53»


gambar40

  • «Create»


gambar35

Create . «Continue» . 30 , !



Load Balancer HTTPS



  • Elastic Beanstalk, «Configuration». !
  • «Load balancer» «Edit»
  • «Listeners» «Add listener»
  • Port 443, «Add».


gambar25

  • «Apply»


!



Snowplow Clojure Collector (, ).



, , .



— . Route 53, .





  • Clojure Collector, Elastic Beanstalk.
  • , Amazon Route 53.
  • SSL- .
  • Tomcat S3. S3 .


2:



Android Tracker . Tracker Demo, , , «Ok» .



, https://snowplow.denjoy.ru, HTTPS «Start». .





gambar4



gambar44

.



Clojure Collector, Elastic Beanstalk, Tomcat S3. , S3



gambar16

S3 elasticbeanstalk-region-id. resources / environment / logs / publish / (some ID) / (some ID). Some ID – , , e-ab12cd23ef, , , i-1234567890. gzip.



, _var_log_tomcat8_rotated_localhost_access_log.txt123456789.gz – , ETL .



image13

, . HTTP- 200. , , Clojure Collector . . :



gambar27

, JSON .



gambar51

3. ETL





  • Clojure Collector.
  • IAM, 0 .




.



, , AWS Elastic MapReduce (EMR).



  • Tomcat.
  • , IP-.
  • , schema JSON.

  • , , Amazon Redshift.


. , ETL S3-. , , . Tomcat , , .



Java- EmrEtlRunner . ETL Amazon Elastic MapReduce. , EmrEtlRunner . , , , 60 .



EmrEtlRunner



ETL — Unix, . , , snowplow_emr_rXX, XX — . snowplow_emr_r117_biskupin.zip.



  • ZIP- snowplow-emr-etl-runner . .
  • Snowplow Github , SQL, .

  • , , snowplow-emr-etl-runner , :


git clone https://github.com/snowplow/snowplow.git


Git, .



gambar56

  • snowplow-emr-etl-runner snowplow .
  • config targets.
  • :

    • snowplow/3-enrich/emr-etl-runner/config/config.yml.sample config/config.yml.
    • snowplow/3-enrich/config/iglu_resulver.json config/iglu_resulver.json.
    • snowplow/4-storage/config/targets/redshift.json config/targets/redshift.json.




gambar55

:



  |-- snowplow-emr-etl-runner
  |-- snowplow
  | |-- -SNOWPLOW GIT REPO HERE-
  |-- config
  | |-- iglu_resolver.json
  | |-- config.yml
  | |-- targets
  | | |-- redshift.json
  


EC2



Amazon EC2. ETL Amazon, Amazon EC2. ETL , , .



  • AWS Services EC2. «Key Pairs» .
  • , , . .
  • , , «Create key pair».


gambar8

  • . denjoy-snowplow.
  • pem
  • , , <key pair name>.pem .


gambar30

S3



Amazon S3. ETL.



:



  • :raw:in — . - elasticbeanstalk, Clojure Collector’, Elastic Beanstalk.
  • :processin — .
  • :archive — : :raw ( ), :enriched ( ) :shredded ( ).
  • :enriched — : :good ( ), :bad ( , ).
  • :shredded — : :good ( , ), :bad ( , ).
  • :log — , ETL.


, S3, Services AWS S3.



:raw:in , elasticbeanstalk-.



, « » ETL.



«Create bucket» , denjoy-snowplow-data. S3, snowplow. «Next» , , , «Create bucket».



, . :



image10

«Create folder» :



  • archive
  • shredded
  • enriched


gambar15

archive :



  • raw
  • enriched
  • shredded


, enriched, shredded, :



  • good
  • bad


, , :



  |-- elasticbeanstalk-region-id
  |-- denjoy-snowplow-data
  | |-- archive
  | | |-- raw
  | | |-- enriched
  | | |-- shredded
  | |-- encriched
  | | |-- good
  | | |-- bad
  | |-- shredded
  | | |-- good
  | | |-- bad
  


S3 denjoy-snowplow-log. , ETL.



EmrEtlRunner



EmrEtlRunner. config.yml , snowplow config/. :



  • snowplow-setup , 0. , AWS IAM.

  • AWS. , Python/pip, Mac OS X, Homebrew. , Homebrew, brew install awscli AWS.



, awscli, aws configure . , , , , eu-west-1.



  $ aws configure
  AWS Access Key ID: <enter your IAM user Access Key ID here>
  AWS Secret Access Key: <enter you IAM user Secret Access Key here>
  Default region name: <enter the region name, e.g. eu-west-1 here>
  Default output format: <just press enter>
  


aws configure aws emr create-default-rules. - EmrEtlRunner, EC2.



EmrEtlRunner!



EmrEtlRunner



EmrEtlRunner — snowplow-emr-etl-runner.



EmrEtlRunner . . . , 13, rdb_load. . .



EmrEtlRunner config.yml, config. , , , .



  aws:
    access_key_id: AKIAIBAWU2NAYME55123
    secret_access_key: iEmruXM7dSbOemQy63FhRjzhSboisP5TcJlj9123
    s3:
      region: eu-west-1
      buckets:
        assets: s3://snowplow-hosted-assets
        jsonpath_assets:
        log: s3://simoahava-snowplow-log
        raw:
          in:
            - s3://elasticbeanstalk-eu-west-1-375284143851/resources/environments/logs/publish/e-f4pdn8dtsg
          processing: s3://simoahava-snowplow-data/processing
          archive: s3://simoahava-snowplow-data/archive/raw
        enriched:
          good: s3://simoahava-snowplow-data/enriched/good
          bad: s3://simoahava-snowplow-data/enriched/bad
          errors:
            archive: s3://simoahava-snowplow-data/archive/enriched
        shredded:
          good: s3://simoahava-snowplow-data/shredded/good
          bad: s3://simoahava-snowplow-data/shredded/bad
          errors:
            archive: s3://simoahava-snowplow-data/archive/shredded
    emr:
      ami_version: 5.9.0
      region: eu-west-1
      jobflow_role: EMR_EC2_DefaultRole
      service_role: EMR_DefaultRole
      placement:
        ec2_subnet_id: subnet-d6e91a9e
        ec2_key_name: simoahava
      bootstrap: []
      software:
        hbase:
        lingual:
      jobflow:
        job_name: Snowplow ETL
        master_instance_type: m1.medium
        core_instance_count: 2
        core_instance_type: m1.medium
        core_instance_ebs:
          volume_size: 100
          volume_type: "gp2"
          volume_iops: 400
        ebs_optimized: false
        task_instance_count: 0
        task_instance_type: m1.medium
        task_instance_bid: 0.015
      bootstrap_failure_tries: 3
      configuration:
        yarn-site:
          yarn.resourcemanager.am.max-attempts: "1"
        spark:
          maximizeResourceAllocation: "true"
      additional_info:
    collectors:
      format: clj-tomcat
    enrich:
      versions:
        spark_enrich: 1.12.0
      continue_on_unexpected_error: false
      output_compression: NONE
    storage:
      versions:
        rdb_loader: 0.14.0
        rdb_shredder: 0.13.0
        hadoop_elasticsearch: 0.1.0
    monitoring:
      tags: {}
      logging:
        level: DEBUG
  


, , , . -. , , .



:aws:access_key_id

IAM.
:aws:secret_access_key

IAM.
:aws:s3:region

, S3.
:aws:s3:buckets:log

S3, ETL.
-:aws:s3:buckets:raw:in

, Tomcat. . ! , !
:aws:s3:buckets:raw:processing

.
:aws:s3:buckets:raw:archive

.
:aws:s3:buckets:enriched:good

.
:aws:s3:buckets:enriched:bad

.
:aws:s3:buckets:enriched:errors

.
:aws:s3:buckets:enriched:archive

.
:aws:s3:buckets:shredded:good

.
:aws:s3:buckets:shredded:bad

.
:aws:s3:buckets:shredded:errors

.
:aws:s3:buckets:shredded:archive

:aws:emr:region

, EC2.
:aws:emr:placement

.
:aws:emr:ec2_subnet_id

VDS, . , EC2, .
:aws:emr:ec2_key_name

EC2.
:collectors:format

clj-tomcat.
:monitoring:snowplow

(:method, :app_id :collector).


.



-, :aws:s3:buckets:raw:in . . , . , .



gambar38

:aws:emr:ec2_subnet_id , Services AWS EC2. «Instances», . «subnet» aws:emr:ec2_subnet_id.



image48

, .



, , , snowplow-emr-etl-runner.



./snowplow-emr-etl-runner run -c config/config.yml -r config/iglu_resulver.json


gambar57



Invalid InstanceProfile: EMR_EC2_DefaultRule.




ETL S3. .



ETL, AWS Redshift, !





  • snowplow-emr-etl-runner .
  • S3-.
  • ETL S3.


4: Redshift





  • ETL .
  • S3-.
  • GUI SQL-. Table Plus, , . .





Redshift. Redshift — , AWS. , , Tomcat. SQL . , SQL, Codecademy, SQL!



:



  • Redshift.
  • .
  • EmrEtlRunner Redshift.


, , EmrEtlRunner, . SQL- ( ) Snowplow: .





AWS Amazon Redshift.



, ( , ). «Launch Cluster».



image52

. snowplow-cluster. . snowplow.



Node type dc2.large, Cluster type Single Node 1 .



- (5439).



-. , , . - — .



-.



, «Create cluster».



gambar53

.



. Redshift.



image18



, , , .



«Clusters» , .



«Properties» «Network and security» VPC security groups ( sg-c3f5c687).



gambar2

EC2.



.



«Inbound rules» , TCP- 5439 0.0.0.0/0 . , TCP- ( ).



, .



gambar29

. Amazon Redshift . .



gambar41

SQL. Table Plus. «Create new connection» :



  • : Amazon Redshift (com.amazon.redshift.jdbc.Driver)
  • Host: endpoint
  • User: awsuser
  • Password: master_password
  • Database: snowplow


-, .



:



gambar34

«Connect», .



SELECT current_database(); «Run current», , . :



image60

– !





-, , Android Tracker. .sql , DDL, .



.sql , Snowplow:





atomic-def.sql Table Plus. atomic atomic.events.



image22

manifest-def.sql. .



DDL . , ETL , .



.sql :





, SQL- , :



SELECT * FROM pg_tables WHERE schemaname='atomic';


image63



:



  • storageloader, ETL.
  • power_user, , -.
  • read_only, .


SQL-. ($password) , + .



  CREATE USER storageloader PASSWORD '$password';
  GRANT USAGE ON SCHEMA atomic TO storageloader;
  GRANT INSERT ON ALL TABLES IN SCHEMA atomic TO storageloader;
  CREATE USER read_only PASSWORD '$password';
  GRANT USAGE ON SCHEMA atomic TO read_only;
  GRANT SELECT ON ALL TABLES IN SCHEMA atomic TO read_only;
  CREATE SCHEMA scratchpad;
  GRANT ALL ON SCHEMA scratchpad TO read_only;
  CREATE USER power_user PASSWORD '$password';
  GRANT ALL ON DATABASE snowplow TO power_user;
  GRANT ALL ON SCHEMA atomic TO power_user;
  GRANT ALL ON ALL TABLES IN SCHEMA atomic TO power_user;
  


, 12 .



image59

, , atomic storageLoader, .



, :



  SELECT 'ALTER TABLE atomic.' || tablename ||' OWNER TO storageloader;'
  FROM pg_tables WHERE schemaname='atomic' AND NOT tableowner='storageloader';
  


:



ALTER TABLE atomic.* OWNER TO storageloader;


.



image64

,



SELECT * FROM pg_tables WHERE schemaname='atomic' AND tableowner='storageloader';


.



, EmrEtlRunner ETL, storageloader- S3 Redshift.



IAM-



EmrEtlRunner Redshift RDB Loader ( ). , IAM-, Redshift S3-.



  • , AWS Services IAM.
  • Rules. «Create rule».
  • «Select type of trusted entity» AWS - Redshift . «Select your use case» «Redshift — Customizable «Next: permissions».


image14

  • AmazonS3ReadOnlyAccess . «Next: Tags».


gambar43

  • «Next: review»
  • , , RedshiftS3Access «Create Rule».
  • . RedshiftS3Access , . Rule ARN. .


gambar11

  • Amazon Redshift .
  • Snowplow « IAM».


image23

  • «Available IAM rules» , «Add IAM rule» «Done», .


gambar36

Redshift



, 3, config/ targets/ redshift.json.



redshift.json , :



  {
    "schema": "iglu:com.snowplowanalytics.snowplow.storage/redshift_config/jsonschema/2-1-0",
    "data": {
      "name": "AWS Redshift enriched events storage",
        "host": "ADD HERE",
        "database": "ADD HERE",
        "port": 5439,
        "sslMode": "DISABLE",
        "username": "ADD HERE",
        "password": "ADD HERE",
        "roleArn": "ADD HERE",
        "schema": "atomic",
        "maxError": 1,
        "compRows": 20000,
        "sshTunnel": null,
        "purpose": "ENRICHED_EVENTS"
      }
    }
  


, :



  • host: URL- Redshift
  • database:
  • username: storageloader
  • password: storageloader
  • ruleArn: ARN IAM-, .


-.



EmrEtlRunner



, , EmrEtlRunner,

Redshift.



, ( snowplow-emr-etl-runner

):



./snowplow-emr-etl-runner run -c config/config.yml -r config/iglu_resulver.json -t config/targets


:raw:in (, Tomcat)

, , Redshift. ,

.



- :



gambar62



read_only .



gambar42

, , , , (

), ,





, Snowplow.



  • Amazon, , DNS

    AWS.

  • Clojure Collector — , HTTP- Tomcat

    S3-.

  • ETL, ,

    S3.

  • , ETL , ,

    AWS Redshift.



, , , - –

, -.



, , , .



Discourse

Snowplow
— , , .



!




All Articles