Categorias: Todos - nlp - tools - statistics

por Miroslav Alkhimovich 2 anos atrás

187

ф

The evolving landscape of advanced analytics and automation incorporates numerous tools and methodologies for various data-centric tasks. Statistics and data science are foundational, utilizing languages like Python and R, and libraries such as NumPy and SciPy.

ф

ф

Agile & Communication

Presentations
LaTeX Beamer [J]
Google Presentations
MS PowerPoint
Diagramms
Figma
Mindomo
PlantUML
Miro
MS Visio
Gliffy
Draw.io
Corporate Social Network
Yammer
Communication
MS Outlook
MS Teams
Knowlege sharing
Confluence
Task trackers
Trello
Jira

IT Infrastructure

IaC
AWS CloudFormation
Terraform [M]
Ansible
Mail
AWS Workmail
Yandex Mail
Google Gmail
Monitoring [J]
Prometheus
Kibana
Grafana
Zabbix
DNS [J]
WAN [J]

SSL Certificates

Multi domain

Wildcard

OV

EV

DV

DNS Records

PTR

NX

MX

CNAME

AAA

A

Zones

Domain

Structure

Routing delegation

Registration

Server

Yandex DNS

Google Public DNS

AWS Route 53

LAN [J]
Load Balancer
HAProxy
Citrix ADC
AWS Elastic Load Balancing
Nginx [J]
Azure Traffic Manager
Authorization
AWS IAM
LDAP [J]
MS Active Directory [J]
Hosting & Serverless calculation

Windows [M]

Containers

Container hosting

Kubernetes

Azure Databricks

AWS ECS [J]

Docker [J]

docker-compose

Windows

Selectel

SberCloud

Ya.Cloud

AWS Lightsail

AWS EC2 [J]

Azure Windows Server

Linux [M]

Data Science

BI
Ya Datalens
Metabase
reDash
Spotfire
Tableau
Streamlit
Qlik view
Qlik Sense
MS Power BI
Data Mining

MS Excel [J]

SQL [M]

Plotly

MathPlotLib

Seaborn

polars

Pandas

NumPy

numba

Machine Learning
Reinforcement learning

Temporal difference

Dynamic programming

Monte Carlo

Bandit

Recommendation Systems

Collaborative Filtering

Computer Vision

PyTorchCV

timm

Pytesseract

SimpleITK

Mahotas

Detection / Segmentation

pytorch-toolbelt

segmentation-models

detectron2

Image manipulation

Scikit-Image

Pillow

OpenCV

NLP

Emdeddings

fasttext

Gensim

Quadrant

Faiss

HuggingFace

pyMorphy2

spaCy

Textblob

Razdel

Natasha

NLTK

Deep learning

Jax

Keras

PyTorch

PyTorch Lightning

TensorFlow

Classic ML

Gradient Boosting

XGBoost

CatBoost

LightGBM

Vowpal Wabbit

Scikit-Learn

Linear models

Lasso

Ridge

Logistic Regression

Clusterization

Agglomerative

DBSCAN

KMeans

EDA
sweetviz
pandas-profiling
Statistics

Statsmodel

Pingouin

SciPy

Software Engineering

Python [MJ]
Deploy & Code Maintanance
MLOps

Serving

FastAPI [MJ]

https://fastapi.tiangolo.com/tutorial/first-steps/

flask [MJ]

bentoml

Experiment tracking

MLFlow

https://www.mlflow.org/docs/latest/quickstart.html

ClearML

https://clear.ml/docs/latest/docs/getting_started/ds/ds_first_steps

Data tracking & Quality

pydantic [J]

pandera

CML

DVC

https://dvc.org/doc/start

Code quality

pre-commit

Formatters

black [J]

Testing

unittest

hypothesis

pytest [J]

pytest-coverage

Linters

pycodestype

mypy

Flake8 [J]

wemake

Git

GitLab [J]

Bitbucket [J]

GitHub [M]

CI / CD

Team City

Jenkins

Github actions [J]

https://docs.github.com/en/actions/learn-github-actions

Gitlab CI CD

https://docs.gitlab.com/ee/ci/

Algorithms & Data Structures [J]

Data Engineering

MQ
AWS MQ
IBM MQ
AWS SQS
Kafka
Rabbit
ETL

AWS DMS

AWS Glue

Informatica

Databricks

MS SSIS

MS Data Factory

TalenD

Pentaho DI

Languages

Go [J]

R

https://stepik.org/course/497/syllabus

Java

Так себе курс, подойдет только, если вы совсем ничего не знаете о java

https://stepik.org/course/497/syllabus

Scala

Catz

Python

connectorX

ScraPy

dask

https://docs.dask.org/en/stable/10-minutes-to-dask.html

lxml

BeautifulSoup

bonobo

psycopg2 / 3

pandas

sqlalchemy [M]

requests [M]

Data Storage
Storage

Protocols

HTTP [M]

WebDAV

S3 [J]

SCP [J]

Использование на уровне юзера


Копирование с локальной тачки на ремоут тачку

scp local_path username@host:remote_path


Копирование с ремоут тачки на локальную

scp username@host:remote_path local_path

SFTP [J]

FTP [J]

Sber Disk

Ya Disk

MS OneDrive

MS Sharepoint

Minio S3

Установка minio

https://docs.min.io/docs/minio-docker-quickstart-guide.html

MS Blob Storage

Google Drive

AWS S3 [J]

NON - RDBMS

Arango DB

Neo4J

Google Firebase

AWS Dynamo DB

Apache Cassandra

Apache Ignite

MS Cosmos DB

MongoDB [M]

Apache Hadoop

Для пользователя на самом деле достаточно знать 2 команды

hdfs -get remote local

hdfs -put local remote


Архитектура hdfs (Java Api изучать не надо)

https://stepik.org/lesson/15482/step/1?unit=4233

RDBMS

Distributed

On premise

Clickhouse

Apache Hive

Citus

Arenadata

YDB

Snowflake

Google BigQuery

MS Synapse

AWS Redshift

Classic

https://www.sql-ex.ru/

Oracle

MySQL [J]

PostgreSQL [M]

MS SQL Server [J]

Курс по Базам Данных, которые читали в УрФУ 2022

https://www.youtube.com/playlist?list=PLuYsCpx95Allwadi6NMeUYjGg31g7UsPP

Pipeline Orchestration
Cloud

Azure Data Factory

Google Cloud Composer

AWS Step Functions [M]

Tools

Metaflow

Dagster

https://docs.dagster.io/guides

Rundeck

Prefect

Luigi

https://luigi.readthedocs.io/en/stable/

Airflow

https://airflow.apache.org/docs/apache-airflow/stable/howto/index.html


Astronomer - очень полезный ресурс!

https://www.astronomer.io/guides/

Cron [J]