ф

Data Engineering

Pipeline Orchestration

Tools

Cron [J]

Airflow

r

https://airflow.apache.org/docs/apache-airflow/stable/howto/index.htmlAstronomer - очень полезный ресурс!https://www.astronomer.io/guides/

a

Luigi

r

https://luigi.readthedocs.io/en/stable/

Prefect

Rundeck

Dagster

r

https://docs.dagster.io/guides

Metaflow

Cloud

AWS Step Functions [M]

Google Cloud Composer

Azure Data Factory

Data Storage

RDBMS

Classic

r

https://www.sql-ex.ru/

a

MS SQL Server [J]

r

Курс по Базам Данных, которые читали в УрФУ 2022 https://www.youtube.com/playlist?list=PLuYsCpx95Allwadi6NMeUYjGg31g7UsPP

PostgreSQL [M]

MySQL [J]

Oracle

Distributed

Cloud

AWS Redshift

MS Synapse

Google BigQuery

Snowflake

YDB

Arenadata

On premise

Citus

Apache Hive

Clickhouse

NON - RDBMS

Apache Hadoop

r

Для пользователя на самом деле достаточно знать 2 команды hdfs -get remote localhdfs -put local remoteАрхитектура hdfs (Java Api изучать не надо)https://stepik.org/lesson/15482/step/1?unit=4233

MongoDB [M]

MS Cosmos DB

Apache Ignite

Apache Cassandra

AWS Dynamo DB

Google Firebase

Neo4J

Arango DB

Storage

Cloud

AWS S3 [J]

MA

Google Drive

MS Blob Storage

Minio S3

r

Установка minio https://docs.min.io/docs/minio-docker-quickstart-guide.html

MS Sharepoint

MS OneDrive

Ya Disk

Sber Disk

Protocols

FTP [J]

SFTP [J]

SCP [J]

r

Использование на уровне юзераКопирование с локальной тачки на ремоут тачкуscp local_path username@host:remote_pathКопирование с ремоут тачки на локальнуюscp username@host:remote_path local_path

S3 [J]

WebDAV

HTTP [M]

ETL

Languages

Python

requests [M]

sqlalchemy [M]

MA

pandas

psycopg2 / 3

bonobo

BeautifulSoup

lxml

dask

r

https://docs.dask.org/en/stable/10-minutes-to-dask.html

ScraPy

connectorX

Scala

Catz

Java

r

Так себе курс, подойдет только, если вы совсем ничего не знаете о javahttps://stepik.org/course/497/syllabus

R

r

https://stepik.org/course/497/syllabus

Go [J]

MA

Tools

Pentaho DI

TalenD

MA

MS Data Factory

MS SSIS

Databricks

Informatica

AWS Glue

AWS DMS

MA

MQ

Rabbit

Kafka

AWS SQS

IBM MQ

AWS MQ

MA

Software Engineering

Algorithms & Data Structures [J]

Deploy & Code Maintanance

CI / CD

Gitlab CI CD

r

https://docs.gitlab.com/ee/ci/

Github actions [J]

MAr

https://docs.github.com/en/actions/learn-github-actions

Jenkins

Team City

Git

GitHub [M]

Bitbucket [J]

GitLab [J]

Code quality

Linters

Python

Flake8 [J]

wemake

mypy

MA

pycodestype

Testing

Python

pytest [J]

pytest-coverage

hypothesis

unittest

Formatters

Python

black [J]

pre-commit

MLOps

Data tracking & Quality

DVC

r

https://dvc.org/doc/start

CML

pandera

pydantic [J]

Experiment tracking

ClearML

r

https://clear.ml/docs/latest/docs/getting_started/ds/ds_first_steps

MLFlow

r

https://www.mlflow.org/docs/latest/quickstart.html

Serving

bentoml

flask [MJ]

FastAPI [MJ]

r

https://fastapi.tiangolo.com/tutorial/first-steps/

Languages

Scala

Python [MJ]

Data Science

a

IT Infrastructure

Hosting & Serverless calculation

Cloud

Linux [M]

Windows

Azure Windows Server

AWS EC2 [J]

AWS Lightsail

Ya.Cloud

SberCloud

Selectel

Containers

Docker [J]

docker-compose

Container hosting

AWS ECS [J]

MA

Azure Databricks

Kubernetes

MA

On premise

Linux [M]

Windows [M]

Authorization

MS Active Directory [J]

LDAP [J]

AWS IAM

MA

Load Balancer

Azure Traffic Manager

Nginx [J]

AWS Elastic Load Balancing

Citrix ADC

HAProxy

Kubernetes

MA

DNS [J]

LAN [J]

WAN [J]

Server

AWS Lightsail

MA

AWS Route 53

Google Public DNS

Yandex DNS

Domain

Registration

Routing delegation

Structure

Zones

DNS Records

A

AAA

CNAME

MX

NX

PTR

SSL Certificates

Monitoring [J]

Zabbix

Grafana

Kibana

Prometheus

Mail

IaC

Agile & Communication