Understanding LLMs

Understanding LLMs

What is an LLM?

 
A Large Language Model (LLM) is a sophisticated deep learning algorithm, specifically a neural network, designed to process and generate human language. These models are trained on enormous datasets, enabling them to perform various Natural Language Processing (NLP) tasks, including:
 
  • Language translation
  • Text summarisation
  • Question answering
  • Text generation
  • Sentiment analysis
  • Named entity recognition
  • Language modelling
 
Notable examples of LLMs include:
  • Transformers (such as BERT and RoBERTa)
  • Generative models (like GPT-3 and GPT-4)
  • Meta's AI models
 
Large language models, also known as neural networks (NNs), are computing systems inspired by the human brain's structure. They utilise layered nodes, similar to neurons, to process information. Beyond teaching human languages to artificial intelligence (AI) applications, LLMs can be trained for diverse tasks, such as:
  • Understanding protein structures
  • Writing software code
  • And more
 
Similar to the human brain, LLMs require pre-training and fine-tuning to solve complex problems, including:
  • Text classification
  • Question answering
  • Document summarisation
  • Text generation
 
Their versatile problem-solving capabilities have far-reaching applications in fields like:
  • Healthcare
  • Finance
  • Entertainment
 
In these domains, LLMs power various NLP applications, including:
  • Translation
  • Chatbots
  • AI assistants
 
Large language models also have large numbers of parameters, which are akin to memories the model collects as it learns from training. Think of these parameters as the model's knowledge bank.
 

So, what is a transformer model?

 
Let's explore the concept of Transformers, a groundbreaking neural network architecture introduced in "Attention is All You Need" by Vaswani et al. in 2017.
A Transformer model is the most common architecture for large language models, consisting of an encoder and decoder. It processes data by tokenizing input, then conducting simultaneous mathematical equations to discover token relationships, enabling pattern recognition. Transformer models utilize self-attention mechanisms, allowing faster learning than traditional models like Long Short-Term Memory (LSTM). Self-attention considers sequence context to generate predictions.
Input is broken into tokens (words, image patches, sound chunks), each associated with a vector (matrix of numbers) representing meaning. If images or sounds are involved then a token could be little patches of the image or a pixel of the image, or little chunks of that sound. Similar words (e.g., "bound," "jump," "skip," "leap") occupy nearby vector spaces.
 
notion image
 
Vectors pass through an operation called attention blocks, enabling inter-token communication and value updates. Attention determines relevant context words and updates meanings (e.g., "model" in "machine learning model" vs. "fashion model"). Vectors then pass through multi-layer perceptrons or feed-forward layers, akin to answering questions (e.g., "Is this a character or symbol?", "Is it English?", "Is it a noun?"). This process involves matrix multiplication. Through this process the vectors are updated with the new numbers. The attention and perceptron layers repeat until all of the vectors have somehow been baked into the very last vector in the sequence.. A final operation generates a probability distribution over possible next tokens or text chunks.
Now this step of figuring out "what comes next" continues till we form complete sentences.
To make a tool like this into something like a chatbot, mostly what people would do is create a little bit of text that establishes the setting of a user interacting with a helpful AI assistant. Something like, "What follows is a conversation between a user and a helpful, very knowledgeable AI assistant".
This would serve as the system prompt, and then you will use the user's initial question or prompt as the first bit of dialogue and then you will have it start predicting what such a helpful AI assistant would say in response.
At a very high level this is the general idea of how transformers work.

What is the difference between LLMs and GenAI?

 
Generative Artificial Intelligence (AI) encompasses models capable of generating content across various forms, including text, code, images, videos, and music. Notable examples of generative AI include Midjourney, DALL-E, and ChatGPT. Large Language Models (LLMs) constitute a subset of generative AI, specifically trained on textual data to produce textual content. ChatGPT exemplifies popular generative text AI. All Large Language Models fall under the category of Generative AI.

How do LLMs work?

 
A large language model is based on a transformer model and works by receiving an input, encoding it, and then decoding it to produce an output prediction. But before a large language model can receive text input and generate an output prediction, it requires training, so that it can fulfill general functions, and fine-tuning, which enables it to perform specific tasks.
 
Training:
Large language models are pre-trained using large textual datasets from sites like Wikipedia, GitHub, or others. These datasets consist of trillions of words, and their quality will affect the language model's performance. At this stage, the large language model engages in unsupervised learning, meaning it processes the datasets fed to it without specific instructions. During this process, the LLM's AI algorithm can learn the meaning of words, and of the relationships between words. It also learns to distinguish words based on context. For example, it would learn to understand whether "right" means "correct," or the opposite of "left."
 
Fine-tuning:
In order for a large language model to perform a specific task, such as translation, it must be fine-tuned to that particular activity. Fine-tuning optimizes the performance of specific tasks.
 
Prompt-tuning:
Prompt-tuning fulfills a similar function to fine-tuning, whereby it trains a model to perform a specific task through few-shot prompting, or zero-shot prompting. A prompt is an instruction given to an LLM. Few-shot prompting teaches the model to predict outputs through the use of examples.
 
For instance, in this sentiment analysis exercise, a few-shot prompt would look like this:
Customer review: This cake tastes amazing! Customer sentiment: positive Customer review: This cake tastes horrible! Customer sentiment: negative
 
The language model would understand, through the semantic meaning of "horrible," and because an opposite example was provided, that the customer sentiment in the second example is "negative."
Alternatively, zero-shot prompting does not use examples to teach the language model how to respond to inputs. Instead, it formulates the question as "The sentiment in ‘This cake tastes horrible' is…." It clearly indicates which task the language model should perform, but does not provide problem-solving examples.
 
(Everything discussed here on out is based on information publicly available as of 1/1/2025. I have attached all my references at the end. Please use them to find out the latest.)

Difference between different models like BERT or Generative Models like GPT-3/4

 
Transformers like BERT and generative models like GPT-3/4 share similarities but serve distinct purposes:

Transformers (BERT, RoBERTa, etc.)

 
Masked Language Modeling: Predict missing words in text.
Natural Language Understanding (NLU): Sentence classification, sentiment analysis, question answering.
Pre-training: Learn contextual representations.
Fine-tuning: Adapt to specific tasks.

Generative Models (GPT-3/4, etc.)

 
Language Generation: Produce coherent text.
Text Completion: Fill gaps.
Text-to-Text: Translation, summarization.
Zero-Shot Learning: Perform tasks without fine-tuning.

Key Differences:

 
Objective: BERT (understanding) vs. GPT-3/4 (generation).
Training: Masked language modeling (BERT) vs. Autoregressive language modeling (GPT-3/4).
Architecture: Similar transformer architecture, but GPT-3/4 has more layers/parameters.

Real-World Applications:

 
BERT: Sentiment analysis, question answering.
GPT-3/4: Content creation, chatbots, language translation.

What kind of AI Model is Meta’s Llama?

 
Meta's AI models are based on generative transformer-based architectures, specifically designed for natural language processing. Their model, LLaMA, is a foundational large language model trained on broad data, enabling adaptation to various downstream tasks.
LLaMA is comparable to other notable models like OpenAI's GPT-3 and Google's PaLM.

Key Features:

 
Generative Transformer-Based: LLaMA utilizes transformer architecture for generating human-like content.
Large Language Model: Trained on vast datasets for versatile applications.
Foundational Model: Adaptable to diverse tasks and domains.
Competitive Performance: Comparable to GPT-3 and PaLM.

Applications:

 
Natural Language Processing: Text generation, language understanding and more.
Task-Specific Models: Fine-tuned for instructions, conversations and specific domains.
Multimodal Capabilities: Processes text and image input, with potential for visual foundation models.

What kind of AI model is Google’s Gemini?

 
Google Gemini is a generative AI model, specifically a multimodal model, capable of processing and generating text, code, audio, image and video content.
It's designed for versatility and scalability, suitable for various applications, from natural language processing to computer vision tasks.

Key Features:

 
Multimodal Capabilities: Handles text, code, audio, image and video inputs and outputs. Scalability: Efficiently runs on data centers, mobile devices and more. State-of-the-Art Performance: Excels in benchmarks, demonstrating advanced capabilities.

Model Variants:

 
Gemini 2.0 Flash: Next-generation features, improved capabilities and multimodal generation. Gemini 1.5 Flash: Balanced performance for diverse tasks. Gemini 1.5 Pro: Optimized for complex reasoning tasks. Gemini 1.0 Pro: Natural language tasks, multi-turn text and code chat.

Applications:

 
Natural Language Processing: Text generation, language understanding and more. Computer Vision: Image recognition, object detection and more. Code Generation: Assists developers with coding tasks.

Reference:


  1. https://www.elastic.co/what-is/large-language-models
  1. https://ai.meta.com/resources/
  1. https://www.youtube.com/watch?v=wjZofJX0v4M
  1. https://ai.google.dev/gemini-api/docs/models/gemini
  1. https://blog.google/technology/ai/google-gemini-ai/