LLMs Learning Path | マインドモ・マインドマップ

Communicative Language Teaching.

Victoria Solにより

Mindamap

Gurjan Sahotaにより

Process map Brainstormings

Mindomo Teamにより

C#

Mike Tonにより

Generative AI Systems Development

2. Knowledge Integration & Reasoning

1. Workflow & Architecture Design

Floating topic

Prompt Engineering

Code Generation and Execution

Chain of Code (CoC) Prompting [Li et al., 2023b]

Structured Chain-of-Thought (SCoT) Prompting [Li et al., 2023c]

Program of Thoughts (PoT) Prompting [Chen et al., 2022]

Fine-Tuning and Optimization

Automatic Prompt Engineer (APE) [Zhou et al., 2022]

User Interaction

Active-Prompt [Diao et al., 2023]

Reduce Hallucination

ReAct Prompting [Yao et al., 2022]

Retrieval Augmented Generation (RAG) [Lewis et al., 2020]

Reasoning and Logic

Graph-of-Thought (GoT) Prompting [Yao et al., 2023b]

Least-to-Most Prompting [Denny Zhou et al. 2023]

Tree-of-Thoughts (ToT) Prompting [Yao et al., 2023a]

Self-Consistency [Wang et al., 2022]

Auto-CoT [Zhang et al., 2022]

Chain-of-Thought (CoT) Prompting [Wei et al., 2022]

New Tasks no Extensive Training

Few-shot Prompting [Brown et al., 2020]

Zero-shot Prompting [Radford et al., 2019]

Planning / Estimation AI Agent + Knowlege Extraction Maturity Level #1

Inference / Deploymnt Phase

Edge

Modeling/ Prototyping Phase

Open LLMs

Edge Decentralized

OpenAI Services

Edge

AI Agent System Abilities

Personal and Collaborative

Execution and Interaction

Planning and Decision Making

Perceiving and Predictive Modeling

Self-learning and Continuous Improvement

LLM-Based Agent

Action

Feedback Loop

Environment Interaction

Response Generation

Brain

Tool Interface

Knowledge Integration

Transferability & Generalization

Reasoning & Planning Layer

Memory Capabilities and Retrival

Core LLM Capabilities

Perception

Preprocessing

Input Modalities

Context Integration

LLM-Based Agent Barin LLM as a Main Part

QA Types

Special Types of QA

QA Types Based on Interaction

Multiple-Choice QA

Yes/No QA

Contextual QA (Clarification-Based)

Conversational QA

Methodological Distinctions (Based on Answer Generation)

Knowledge Based QA

Rule Based QA

Retrieval-Augmented QA

Extractive QA

Generative QA

Abstractive QA

UC -Data

Personalized Rec

Semi-Personalized Rec

Category Recommendation

Popularity Based Recommendation

Reviews Sentiment Labeling

Reviews Summarization

Tag Based Search

Tag Analysis and/or Generation

Similarity Search

Business Sectors

Manufacturing

Knowledge Management

Supply Chain Management

Human Resources and Talent Management

Research and Development

Healthcare

Education and Training

Regulatory Compliance

Customer Relationship Management (CRM)

Sales and Marketing

Finance and Banking

e-Business

Basic LLMs Tasks

Content Generation and Correction

Information Extraction

Text-to-Text Transformation

Semantic Search

Sentiment Analysis

Content Personalization

Ethical and Bias Evaluation

Paraphrasing

Language Translation

Text Summarization

Conversational AI

Question Answering

ML Scenarios & Tasks

Federated Learning

Meta-Learning (Learning to Learn)

Active Learning

Transfer Learning Involves leveraging knowledge from one task to improve learning in a related but different task. This is particularly useful when there is limited labeled data in the target domain.

Self-Supervised Learning A form of unsupervised learning where the data itself provides the supervision.

Multi-Task Learning Involves training a model on multiple related tasks simultaneously, sharing representations between tasks to improve generalization

Reinforcement Learning Involves training an agent to make a sequence of decisions by learning from interactions with an environment. The agent receives rewards or penalties and aims to maximize cumulative rewards. (game playing, robotics, and autonomous vehicles)

Semi-Supervised Learning This combines both labeled and unlabeled data to improve learning accuracy. It’s often used in cases where obtaining a large amount of labeled data is expensive or time-consuming.

Unsupervised Learning The model is given raw, unlabeled data and has to infer its own rules and structure the information.

Dimensionality Reduction

PCA, t-SNE

Clustering

Supervised Learning Uses labeled datasets to train algorithms to predict outcomes and recognize patterns

Regression

Classification

Binary MultiClass MultiLabel

AI/ML Projects Types

Domain-Specific

Innovation and R&D Projects

Technology and Software Development

Entertainment and Media

Public Safety

Agriculture

Education

Energy and Utilities

Transportation and Logistics

Insurance

E-commerce

Manufacturing and Logistics

Finance and Banking

Healthcare and Medicine

Technical Categorization

Speech Recognition and Audio Analysis

Computer Vision

Generative AI (LLMs)

Recommendation Systems

Text Mining and Natural Language Processing (NLP)

Predictive Modeling

Signal

Supervised/Unsupervised

Time-Series forcasting

Strategic Categorization Organizational goals, market positioning and industry-specific needs

Training and Development

Social Impact and Sustainability

Data-Driven Decision Support

Product and Service Innovation

Risk Management and Compliance

Customer Experience Enhancement

Optimization and Efficiency Projects

AI Strategy

Traditional Models(CPU) Local (With Less Data, Predictive Modeling)

Pretrained Models (CPU) Local (Moderate Data, Predictive Modeling)

Some Use Cases

MLOps /CI-CD

Cost

Applying LLMs (Required Large Data Mostly QA)

LLMs on Cloud All UCs

All Use Cases

MLOps /CI-CD (Level 2)

API (e.g. OpenAI), Some UCs

Some Use Cases

AI Wow

Output Quality

Cost (Long Term)

Cost (Short Term)

Run-Time

Transformer

PaLM Family

U-PaLM

PaLM-E

PaLM2

PaLM

Flan-PaLM

Med-PaLM M

Med-PaLM2

Med-PaLM

Distributed LLM Training

Optimizer Parallelism: Focuses on partitioning optimizer state and gradients to reduce memory consumption on individual devices.

Model Parallelism Combines aspects of tensor and pipeline parallelism for high scalability but requires complex implementation.

Hybrid Parallelism Combine pipeline and tensor parallelism for optimal performance based on the model architecture and available resources.

Tensor Parallelism Shards a single tensor within a layer across devices, efficient for computation but requires careful communication management.

Pipeline Parallelism Divides the model itself into stages (layers) and assigns each stage to a different device, reduces memory usage but introduces latency.

Data Parallelism Replicates the entire model a cross devices, easy to implement but limited by memory constraints.

PEFT

Limited Computational Resource

Fine-Tuning II

Our Dataset is Different from the Pre-Trained Data

Fine-Tuning I

Large Labeled Dataset is Avaiable

Background

LLMs Adaptation Stages

Multi-Turn Instructions

Single-Turn Instructions

Reasoning in LLMs

In-context

Zero-Shot

Fine-Tuning

Instruction-tuning

Transfer Learning

Alignment-tuning

RLHF

Pre-Training

Language Modeling

Architecture

Attention in LLMs

LLM Essentials

Prompting

- Zero-Shot Prompting - In-context Learning - Single and Multi -Turn Instructions

Language Modeling

- Full Language Modeling - Prefix Language Modeling - Masked Language Modeling - Unified Language Modeling

Transformers Architectures

- Encoder Decoder : This architecture processes inputs through the encoder and passes the intermediate representation to the decoder to generate the output. - Causal Decoder : A type of architecture that does not have an encoder and processes and generates output using a decoder, where the predicted token depends only on the previous time steps -0 Prefix Decoder : where the attention calculation is not strictly dependent on the past information and the attention is bidirectional - Mixture-of-Experts: It is a variant of transformer architecture with parallel independent experts and a router to route tokens to experts.

Fine Tuning

- Instruction-tuning - Alignment-tuning - Transfer Learning

NLP Fundamentals

Encoding Positions

- Alibi - RoPE

Tokenization

- Wordpiece - Byte pair encoding (BPE) - UnigramLM

Attention In LLMs

- Self-Attention : Calculates attention using queries, keys, and values from the same block (encoder or decoder). - Cross Attention: It is used in encoder-decoder architectures, where encoder outputs are the queries, and key-value pairs come from the decoder. - Sparse Attention : To speedup the computation of Self-attention, sparse attention iteratively calculates attention in sliding windows for speed gains. - Flash Attention : To speed up calculating attention using GPUs, flash attention employs input tiling to minimize the memory reads and writes between the GPU high bandwidth memory (HBM) and the on-chip SRAM.

LLM Components

Adaptation

Decoding Strategies

Alignment

Fine-tuning and Instruction Tuning

Model Pre-training

LLM Architectures

Positional Encoding

Tokenizations

LLMs Cpabilities

Augmented

Interacting with users

Virtual acting

Assignment planning

Tool utilization

Task decomposition

Knowledge base utilization

Tool planning

Self-improvement

Self-refinement

Self-cirtisim

Emerging

Reasoning

Arithmetic

Symbolic

Common Sense

Logical

Instruction following

Few-shot

Turn based

Task definition

In-context learning

Pos/Neg example

Symbolic reference

Step by step solving

Basic

Comprehension

Reading Comprehension

Simplification

Summarization

Multilingual

Crosslingual QA

Crosslingual Tasks

Translation

World Knowledge

Understanding of global issues and challenges

Awareness of global economic and political systems

Knowledge of international law and policies

Familiarity with different cultures and societies

Understanding global events and trends

Coding

Continuous learning and updating coding skills

Good understanding of algorithms and data structures for effective coding

Coding knowledge enables LLMs to automate tasks and improve efficiency

Proficient in coding to enhance their capabilities

Programming languages are a key skill for LLMs to develop

- Masked Language Modeling - Causal Language Modeling - Next Sentence Prediction - Mixture of Experts

- Supervised Fine-tuning - General Fine-tuning - Multi-turn Instructions - Instruction Following

- Decoder-Only - Encoder-Decoder - Hybrid

- Absolute Positional Embeddings - Relative Positional Embeddings - Rotary Position Embeddings - Relative Positional Bias