The text delves into the nuances of advanced machine learning techniques and their applications, focusing on various methods of text embedding like absolute and relative positional embeddings, as well as the implementation of Rotary Position Embeddings.
Methodological Distinctions
(Based on Answer Generation)
Knowledge Based QA
Rule Based QA
Retrieval-Augmented QA
Extractive QA
Generative QA
Abstractive QA
UC -Data
Personalized Rec
Semi-Personalized Rec
Category Recommendation
Popularity Based
Recommendation
Reviews Sentiment Labeling
Reviews Summarization
Tag Based Search
Tag Analysis and/or Generation
Similarity Search
Business Sectors
Manufacturing
Knowledge Management
Supply Chain Management
Human Resources and Talent Management
Research and Development
Healthcare
Education and Training
Regulatory Compliance
Customer Relationship Management (CRM)
Sales and Marketing
Finance and Banking
e-Business
Basic LLMs Tasks
Content Generation and Correction
Information Extraction
Text-to-Text Transformation
Semantic Search
Sentiment Analysis
Content Personalization
Ethical and Bias Evaluation
Paraphrasing
Language Translation
Text Summarization
Conversational AI
Question Answering
ML Scenarios & Tasks
Federated Learning
Meta-Learning (Learning to Learn)
Active Learning
Transfer Learning
Involves leveraging knowledge from one task to
improve learning in a related but different task.
This is particularly useful when there is limited
labeled data in the target domain.
Self-Supervised Learning
A form of unsupervised learning where
the data itself provides the supervision.
Multi-Task Learning
Involves training a model on multiple related tasks
simultaneously, sharing representations between
tasks to improve generalization
Reinforcement Learning
Involves training an agent to make a sequence of decisions by learning from interactions with an environment. The agent receives rewards or penalties and aims to maximize cumulative rewards. (game playing, robotics, and autonomous vehicles)
Semi-Supervised Learning
This combines both labeled and unlabeled data to improve learning accuracy. It’s often used in cases where obtaining a large amount of labeled data is expensive or time-consuming.
Unsupervised Learning
The model is given raw, unlabeled data and has to infer
its own rules and structure the information.
Dimensionality Reduction
PCA, t-SNE
Clustering
Supervised Learning
Uses labeled datasets to train algorithms
to predict outcomes and recognize patterns
Regression
Classification
Binary
MultiClass
MultiLabel
AI/ML Projects Types
Domain-Specific
Innovation and R&D Projects
Technology and Software Development
Entertainment and Media
Public Safety
Agriculture
Education
Energy and Utilities
Transportation and Logistics
Insurance
E-commerce
Manufacturing and Logistics
Finance and Banking
Healthcare and Medicine
Technical Categorization
Speech Recognition and Audio Analysis
Computer Vision
Generative AI (LLMs)
Recommendation Systems
Text Mining and Natural Language Processing (NLP)
Predictive Modeling
Signal
Supervised/Unsupervised
Time-Series forcasting
Strategic Categorization
Organizational goals, market positioning
and industry-specific needs
Training and Development
Social Impact and Sustainability
Data-Driven Decision Support
Product and Service Innovation
Risk Management and Compliance
Customer Experience Enhancement
Optimization and Efficiency Projects
AI Strategy
Traditional Models(CPU) Local
(With Less Data, Predictive Modeling)
Pretrained Models (CPU) Local
(Moderate Data, Predictive Modeling)
Some Use Cases
MLOps /CI-CD
Cost
Applying LLMs
(Required Large Data
Mostly QA)
LLMs on Cloud
All UCs
All Use Cases
MLOps /CI-CD (Level 2)
API (e.g. OpenAI),
Some UCs
Some Use Cases
AI Wow
Output Quality
Cost (Long Term)
Cost (Short Term)
Run-Time
Transformer
PaLM Family
U-PaLM
PaLM-E
PaLM2
PaLM
Flan-PaLM
Med-PaLM M
Med-PaLM2
Med-PaLM
Distributed LLM Training
Optimizer Parallelism:
Focuses on partitioning optimizer
state and gradients to reduce memory
consumption on individual devices.
Model Parallelism
Combines aspects of tensor and
pipeline parallelism for high scalability
but requires complex implementation.
Hybrid Parallelism
Combine pipeline and tensor
parallelism for optimal performance
based on the model architecture
and available resources.
Tensor Parallelism
Shards a single tensor within
a layer across devices, efficient
for computation but requires
careful communication management.
Pipeline Parallelism
Divides the model itself into stages
(layers) and assigns each stage to a
different device, reduces memory
usage but introduces latency.
Data Parallelism
Replicates the entire model a
cross devices, easy to implement
but limited by memory constraints.
PEFT
Limited Computational
Resource
Fine-Tuning II
Our Dataset is
Different from the
Pre-Trained Data
Fine-Tuning I
Large Labeled
Dataset is Avaiable
Background
LLMs Adaptation Stages
Multi-Turn Instructions
Single-Turn Instructions
Reasoning in LLMs
In-context
Zero-Shot
Fine-Tuning
Instruction-tuning
Transfer Learning
Alignment-tuning
RLHF
Pre-Training
Language Modeling
Architecture
Attention in LLMs
LLM Essentials
Prompting
- Zero-Shot Prompting
- In-context Learning
- Single and Multi -Turn Instructions
Language Modeling
- Full Language Modeling
- Prefix Language Modeling
- Masked Language Modeling
- Unified Language Modeling
Transformers Architectures
- Encoder Decoder : This architecture processes inputs through
the encoder and passes the intermediate representation to the
decoder to generate the output.
- Causal Decoder : A type of architecture that does not have an
encoder and processes and generates output using a decoder,
where the predicted token depends only on the previous time
steps
-0 Prefix Decoder : where the attention calculation is not
strictly dependent on the past information and the attention
is bidirectional
- Mixture-of-Experts: It is a variant of transformer architecture
with parallel independent experts and a router to route tokens
to experts.
Fine Tuning
- Instruction-tuning
- Alignment-tuning
- Transfer Learning
- Self-Attention : Calculates attention using queries, keys, and values from the same block (encoder or decoder).
- Cross Attention: It is used in encoder-decoder architectures, where encoder outputs are the queries, and key-value pairs come from the decoder.
- Sparse Attention : To speedup the computation of Self-attention, sparse attention iteratively calculates attention in sliding windows for speed gains.
- Flash Attention : To speed up calculating attention using GPUs, flash attention employs input tiling to minimize the memory reads and writes between the GPU high bandwidth memory (HBM) and the on-chip SRAM.
LLM Components
Adaptation
Decoding Strategies
Alignment
Fine-tuning and Instruction Tuning
Model Pre-training
LLM Architectures
Positional Encoding
Tokenizations
LLMs Cpabilities
Augmented
Interacting
with users
Virtual acting
Assignment planning
Tool
utilization
Task decomposition
Knowledge base utilization
Tool planning
Self-improvement
Self-refinement
Self-cirtisim
Emerging
Reasoning
Arithmetic
Symbolic
Common Sense
Logical
Instruction
following
Few-shot
Turn based
Task definition
In-context learning
Pos/Neg example
Symbolic reference
Step by step
solving
Basic
Comprehension
Reading Comprehension
Simplification
Summarization
Multilingual
Crosslingual QA
Crosslingual Tasks
Translation
World
Knowledge
Understanding of global issues and challenges
Awareness of global economic and political systems
Knowledge of international law and policies
Familiarity with different cultures and societies
Understanding global events and trends
Coding
Continuous learning and updating coding skills
Good understanding of algorithms and data structures for effective coding
Coding knowledge enables LLMs to automate tasks and improve efficiency
Proficient in coding to enhance their capabilities
Programming languages are a key skill for LLMs to develop
- Masked Language Modeling
- Causal Language Modeling
- Next Sentence Prediction
- Mixture of Experts
Text Embedding
- Supervised Fine-tuning
- General Fine-tuning
- Multi-turn Instructions
- Instruction Following