Study Guide: Data Engineering

Tools

Data Engineering Concepts

  • ETL (Extract, Transform, Load)
  • Application Database Architecture (donnemartin’s awesome app)
  • Distributed Computing
  • Information Retrieving
  • Signal detection and estimation
  • Computer Networks

Shell/Command Language

  • Bash
  • Unix
  • Linux

Programming Languages

  • Java
  • Scala
  • Ruby
  • Huskell
  • Other: PHP, Ruby, C#, .NET, Postgres, C++, Sage, RPY

Scripted Languages

  • Python
  • R

Distributed Dynamic Programming Languages

  • Julia
  • Clojure

Collection and Ingestion Tools / ETL Frameworks / Distributed Computing Frameworks

  • Apache Kafka
  • Apache Flink
  • Fluentd
  • Embulk
  • Luigi
  • Airflow
  • Azkaban
  • AKKA (toolkit for JVM)

Storage and Management Tools

  • Amazon Redshift
  • Hadoop HDFS (Hbase, Hive)
  • Google BigQuery
  • NoSQL, MySQL, PostgreSQL
  • MariaDB
  • MongoDB
  • Data Warehouse

Data Processing Tools

  • Hive
  • Spark
  • Hadoop MapReduce
  • SQL (Declarative language)
  • ELK Stack

Application Development Tools

  • Django
  • JavaScript

Cloud

  • AWS
  • Google Cloud
  • Microsoft Azure

Data Visualization

  • Redash
  • Tableau
  • Others: Grafana, Metabase, Superset

Books

Online Courses

Certifications

Side Project Ideas

Online Resources

Advertisements