LLM Explained: A Beginner's Guide

Sam Naji, Joseph Tekriti
June 5, 2023
15 minute read
Table of Contents

Each of you must have seen the auto-complete feature on your phone or when using Google. But have you ever wondered how this auto-complete feature works?

Let's say you type the letter ‘as’; your mobile autocomplete feature predicts what you want to type. To do so, it looks up all the words in the database (dictionary) that start with 'as' and produces a numerical value for all the words based on their general frequency. It may come up with words like ‘ask’, ‘asphalt’ and ‘asia', which are more likely to occur than other words starting with 'as'.

Google's autocomplete feature operates similarly, primarily based on the frequency of past searches initiated. However, it has a limitation. It cannot predict something new that has yet to occur, even though everything must have its first occurrence, right?

Large Language Models work similarly, arguably better, because they can even predict something that has not had a chance to transpire yet.


Large Language Models (LLMs) are advanced computer programs that leverage artificial intelligence to interpret and comprehend natural language. These models employ deep learning techniques and extensive datasets to understand a given prompt and generate a suitable response. The capabilities of these AI models are significantly enhanced by LLMs, enabling them to perform tasks such as language translation, question-answering, text summarization, and creative writing. Sometimes, LLMs are referred to as Generative A.I., a type of artificial intelligence technology capable of producing various forms of content, including text, images, audio, and synthetic data.

Consider a scenario where we task an LLM with predicting the word that might follow 'I want to ______'. The model will attempt to predict the potential words from the English dictionary that could fill in the blank, thereby completing the sentence. How does this happen? The model is trained on the entire English dictionary and a massive amount of data from the internet. This training allows the model to predict the likelihood of certain words following others. Therefore, for every word in the dictionary, the model generates a probability number that could complete the sentence 'I want to ____'.

The graph below illustrates some of the words and their likelihood of being chosen to complete the sentence. As seen from the graph, words like 'Play', 'Dance', and 'Sleep' have higher probabilities, while 'Cry' and 'Study' have lower occurrences based on the vast data from the Internet. 'Black Hole' has a very low probability due to its grammatical inaccuracy and lesser usage.

Types of Large Language Models

There are various ways to categorize these models. In this context, we have divided them into two primary categories: Base LLMs and Instruction Tuned LLMs.

As the name implies, base LLMs are large language models that predict the next word based on the training text data. These models are trained on copious text data and learn to anticipate the most probable next word, considering the context of the sentence. GPT-3 is an example of a Base LLM. It has been trained on considerable text data and can generate coherent and grammatically correct sentences.

Instruction Tuned LLMs, on the other hand, are large language models that strive to adhere to the provided instructions. These models are fine-tuned for a specific task or domain, learning to generate text that complies with the given instructions. T5 is an example of an Instruction Tuned LLM, which can be fine-tuned for various tasks such as text summarization, question answering, and translation. By fine-tuning the model for a specific task, it can generate more accurate and relevant results for that task.

Word Embeddings

Let's delve into how an LLM model assigns each word in the dictionary a numerical value, also known as a vector and uses it to predict the subsequent word in a sentence. Let's assign numbers from 1 to n to each word alphabetically. We get a range of values from 1 to n depending on the dictionary used. However, this approach may assign vastly different values to synonyms like 'assume' and 'presume', even though we want to map these similar words to have closer values. Techniques like 'word2vec' and 'Glove', known as Word Embeddings, are used for this purpose.

Word embedding is a type of natural language processing technique that represents words as vectors in a high-dimensional space. The position of each word vector is determined by the contexts in which the word appears in a large text corpus. For instance, if the computer sees the words 'Programming' and 'Language' used together, it learns that they are related, such as both being used in the context of computer science. This understanding helps the computer interpret the meaning of new sentences containing these words. Word embeddings prove effective in various natural language processing tasks, including machine translation, named entity recognition, and sentiment analysis.

History about Transformers

Different neural network optimizations are used for various types of data. Prior to the advent of Transformers, recurrent neural networks (RNNs) and convolutional neural networks (CNNs) were the primary neural network architectures for natural language processing (NLP) tasks.

RNNs were particularly well-suited for NLP tasks that involved sequential data processing, such as language modeling and machine translation. However, they encountered the issue of vanishing gradients, which made it challenging for the model to learn long-range dependencies between sequence elements. To address this, researchers introduced RNN variants like gated recurrent units (GRU) and long short-term memory (LSTM), designed to better capture long-term dependencies.

CNNs, initially developed for image processing tasks, were later adapted for NLP tasks such as text classification and sentiment analysis. They were capable of learning local patterns in text but were limited by fixed-length windows for processing input sequences.

Both these models had limitations, leading to the introduction of Transformers. The Transformer architecture was designed to effectively address the problem of long-range dependencies. The attention mechanism used in Transformers allows the model to focus on different parts of the input sequence selectively. This enables the model to learn relationships between sequence elements without being constrained by the fixed-length context windows used in previous models.


A transformer is like a robot that helps computers understand the language better by using a unique trick to analyze words in a sentence and figure out what they mean. Just like how we read a book and use not just a word but words around the word to understand the meaning and get the whole picture of the sentence, similarly, the transformer looks at all the words in a sentence at the same time and uses the relationships between each word to figure out the use of the word. Below is a simple explanation of the model.

  1. The words are converted into numbers using word embedding methods where similar words are given similar values so that the neural network model can use them.
  2. We use an attention network that takes the words as inputs and operates one word at a time, estimates how much the current word is related to every other word and encodes some value to these words between 0 and 1. Then it takes the weighted sum of these words and encodes it as a 'context vector'. This process is repeated for every word, and a different context vector is generated for each word.
  3. This result is fed into the next layer. Let's call it the 'next word predictor' along with the original word. This layer tells the attention network what it needs to learn to understand the relationship between words and predict the following words better. If the predicted word is incorrect, it uses an algorithm called back-propagation to let the network know to increase attention value for specific words and decrease the weights (importance) for the words that led to the prediction ofthe incorrect word.
  4. So now, when we feed a word to this network, it gives multiple suggestions for the choice of the next word, and one of those suggestions is chosen randomly (this depends on the probability associated with the word and some settings values like temperature, more on this later) and this goes on until the completion of the sentence. This stack of attention and prediction layer is the basic structure of a transformer.

The Large Language Models have multiple stacks of these transformers for improving the accuracy and knowledge of the model. GPT-3 has 96 of these layers. Stacking these networks allows for higher-level reasoning where the lower layer focuses on the word relationship and syntax. The higher layer encodes a more complex relationship that encodes semantics.

Figure 1(a) In the architecture Encoder-Decoder, the input sequence is first encoded into a state vector, which is then used to decode the output sequence (b) A transformer layer, encoder, and decoder modules were built by using stacks of transformer layers. (source- Overview of Large Language Models: From Transformer Architecture to Prompt Engineering (holisticai.com))

The transformer architecture uses an Encoder and Decoder to process data sequences. In natural language processing, the input sequence can be a sentence or a paragraph, and the output sequence can be another sentence or a translation in another language.

Let's consider the example of machine translation from English to French. Here's how the Encoder and Decoder work in the Transformer:

  1. The Encoder takes the input sequence of words (in English) and converts each word into a vector representation called an embedding. The embeddings help the model understand the meaning of each word in the sequence.
  2. The Encoder then processes these embeddings one by one, building up a representation of the entire sequence that captures the relationships between the words. This task is done through multiple self-attention layers, where the model learns to weigh the importance of each word in the context of the whole sequence.
  3. The final output of the encoder is a set of encoded vectors that capture the meaning of the input sequence.
  4. The Decoder then takes these encoded vectors as input and starts generating the output sequence (in French) word by word.
  5. At each step, the Decoder uses self-attention to weigh the importance of the previously generated words in the context of the entire output sequence. This weight helps the model generate coherent and grammatically correct sentences.
  6. The Decoder also has access to the encoder's output through a mechanism called cross-attention. This allows the model to specifically focus on different parts of the encoded input sequence when generating each word of the output sequence.
  7. The output of the Decoder is a sequence of French words that captures the same meaning as the English input sequence.

Overall, the Encoder-Decoder architecture in a Transformer allows the model to process data sequences in a way that captures the relationships between the sequence elements. This is particularly important for natural language processing tasks like machine translation or text generation, where understanding the meaning and context of the input sequence is crucial for generating high-quality output.

Figure 2 GIF above shows how the LLM using transformers understands the relationship betweenwords.(63f8df8c87f95232f94ad05c_Holistic-AI-Figure-2.gif (1960×970) (webflow.com))

Settings you should know about using LLM

  1. Temperature: Temperature is a parameter used in natural language processing models to increase or decrease the "confidence" a model has in its most likely response. The lower the value of the temperature, the more deterministic and factual the result is, in the sense that the highest probable next token is always picked. Increasing temperature value leads to more randomness, encouraging more opinionated or creative outputs. You are increasing the weights of the other possible tokens. In terms of application, you should use a lower temperature value for tasks like fact-based Q.A. to encourage more accurate and concise responses. For poem generation or other creative tasks, it might be beneficial to increase the temperature value. For example: Prompt: "What is the capital of France?"
    Low-temperature response: "The capital of France is Paris."
    High-temperature response: "The capital of France is definitely Paris, but some people might argue that it's Marseille or Nice."
  2. Top-K Sampling: this setting restricts the selection of the next token to the top k most probable tokens. The k value can be set to any number, and the higher the value of k, the more diverse the generated text will be. This means that with a higher k value, the model can select from a wider range of possible tokens. On the other hand, a lower k value makes the model choose from a smaller range of probable tokens. In terms of application, you should use a higher k value to encourage more diverse and creative outputs. A lower k value might be more appropriate for tasks requiring more factual responses.
    Prompt: "Describe the color of the sky."
    Top-k response (k=3): "The sky is blue, with shades of white and light gray."
    Top-k response (k=10): "The sky can be blue, gray, pink, or orange depending on the time of day and weather conditions."
  3. Top-p (Nucleus) Sampling: This setting can be called as probabilistic top-k sampling as it is very similar to top-k sampling. Here, instead of selecting the most probable token at each step, this technique restricts the choice of the next token to a subset of the most likely tokens whose cumulative probability exceeds a certain threshold (p). The subset of tokens is chosen based on their probabilities, and tokens with lower probabilities are discarded. As a result, the model can generate more diverse and coherent responses while still maintaining overall coherence. The p value can be set to any number between 0 and 1, and the higher the p value, the more diverse the generated text will be. Decreasing the value of p leads to more deterministic and predictable outputs, as the model only considers the most likely words. For example:
    Prompt: "Describe a cat."
    Top-p response (p=0.5): "A cat is a small mammal with fur and whiskers that is often kept as a pet."
    Top-p response (p=0.9): "A cat is a four-legged animal that is known for its hunting abilities and often seen grooming itself with its tongue."
  4. Beam Search: the model generates multiple possible sequences and selects the most probable sequence based on a score. The score is calculated by multiplying the probability of each word by a penalty factor for the length of the sequence. The higher the penalty factor, the more the model will prioritize shorter sequences. Beam search generates more coherent and fluent text but can be less creative than other methods like top-k sampling.
    Prompt: "What do you think about pineapple on pizza?"
    Beam search response (low value): "I think pineapple on pizza can be a controversial topic. Some people love it, while others think it's an abomination."
    Beam search response (high value): "Pineapple on pizza is a divisive issue. While some people swear by it, others can't stand the thought of it."
  5. Maximum Length: Maximum length is a parameter that controls the maximum number of tokens that the model will generate in response to a prompt. Setting a lower maximum length will result in shorter responses, while a higher maximum length will result in longer responses. For example, if you set a maximum length of 10 for the
    Prompt: "Who is the current President of the United States?"
    Maximum Length set to 10: "The current President of the United States is Joe Biden."
    Maximum Length set to 20: "The current President of the United States is Joe Biden, who was inaugurated on January 20, 2021."
  6. Frequency Penalty: Frequency penalty is a parameter that penalizes new tokens based on their existing frequency in the text generated so far. This means the model will be less likely to generate tokens already generated multiple times. Setting a higher frequency penalty will result in more unique tokens being generated. For example, if you set a frequency penalty of 1.0 for the
    Prompt: "Tell me about your favorite animal."
    Frequency Penalty set to 0: "My favorite animal is a cat. I love cats so much. They are the best pets."
    Frequency Penalty set to 0.5: "My favorite animal is a cat. I love them because they are soft and cuddly. Other animals are cool too, but cats are my favorite."
    Frequency Penalty is similar to Presence Penalty the only difference is that in Presence Penalty, the model will be less likely to generate tokens that have already been generated, regardless of their frequency.


Creating effective prompts can be tricky; small changes in word choice or word order can impact the model tremendously, and there is only so much you can fit inside a single prompt. Sometimes even if you generate a good prompt, you might see the quality of the model responses to be inconsistent, so to overcome these issues, we perform something called Tuning. Tuning is the process of adjusting the model's parameters to optimize its performance on a particular task.

In Fine Tuning, we take a model that has been pre-trained on a large data set and makes a copy of this model, and then using the learned weights (parameters) as a starting point; we retrain the model on a new domain-specific data set. But, there are some challenges when we try to fine-tune LLMs; as we know, LLMS are trained on vast amounts of data sets, so there are bound to be many weights, and updating all these weights can be a lot of cost and work so might not be worth it for a business. This approach is used when limited data is available for the target task, as it allows the model to transfer its knowledge from the larger dataset to the specific task.

To subdue this problem, we have something called Parameter Efficient Tuning. Instead of retraining all the weights, we retrain a part (subset) of the weights or add new weights when training the model on a domain-specific dataset. One popular parameter-efficient technique is knowledge distillation, where we train a smaller and more efficient model to mimic the behaviour of a larger and more complex model. This approach allows the smaller model to perform similarly to the larger model while using fewer computational resources. For more resources (2303.15647.pdf (arxiv.org)) Which methodology is optimal for what type of LLM is still an unexplored area of research, but the main benefit is you can have your domain-specific model with lower cost and work.

How to Finetune? (source-Platform.openai)

Fine-tuning a pre-trained GPT model involves adapting it to a specific task or domain, such as generating text for a specific purpose like news article generation or answering questions. Following is a step-by-step guide on how to finetune GPT:

  1. Collect Data: The first step is to collect a large amount of relevant data that is similar to the domain you want to fine-tune the GPT model for. For example, if you want to fine-tune the GPT model for a medical text, you must gather medical text data to learn from.
  2. Prepare Data: Once you have collected the data, the next step is to prepare the data. This includes cleaning, formatting, and tokenizing the text. Tokenization means breaking the text into smaller units, such as words or sub-words. Your Data should be in a JSON format where each line is a prompt-completion pair corresponding to a training example. It should look like the following “ ‘prompt’:’<prompt text>’, ‘completion’:’<ideal generated text>’ ”. You can also use a CLI data preparation tool offered by openai which validates, provides suggestions and reformats data in the following way:

openai tools fine_tunes.prepare_data -f <JSON FILE>

Replace <JSON FILE> with the path to your JSON file.

The part enclosed in {} is an example of how data in the JSON file should be structured for fine-tuning. Each entry in the JSON file should be a dictionary with two keys: "prompt" and "completion". The value for "prompt" is the input you want the model to see, and "completion" is the target output you want the model to generate.

For example, one entry in your JSON file might look like this:

    "prompt":"Item=handbag, Color=army_green, price=$99, size=S->",
    "completion":" This stylish small green handbag will add a unique touch to your look, without costing you a fortune."

This means that for the prompt "Item=handbag, Color=army_green, price=$99, size=S->", you're instructing the model that the desired response should be "This stylish small green handbag will add a unique touch to your look without costing you a fortune.”

  • Fine-Tune the Model: You can start the fine-tuning process after preparing the data. Fine-tuning involves loading and training a pre-trained GPT model on the collected data for a specific task. For example, if you want to generate medical reports, you can fine-tune the GPT model on the medical data to generate reports. You can use the OpenAI CLI command as shown below:
openai api fine_tunes.create -t -m

Where the base model is the name of the model you want to use like davinci, executing the above command does the following things

  • Uploads the file using the file API
  • Creates a fine-tune job
  • Streams events until the completion of the task. When the task is completed, it should display the name of the fine-tuned model
  • Evaluate Model: After fine-tuning the GPT model, you must evaluate its performance. This involves checking how well the model performs on a validation dataset. You perform the validation task within the same function that was used for training the fine-tuned mode that is fine_tunes.create function, and pass in the validation dataset using -v parameter value as shown below:

openai api fine_tunes.create -t <TRAIN_FILE_ID_OR_PATH> \
-m <MODEL>

If the performance is not satisfactory, you can go back to the fine-tuning step and adjust the parameters.

  • Use the Model: After you have fine-tuned the GPT model and evaluated its performance, you can use it to generate text for the specific task it was fine-tuned for. You can use the ‘completion.create’ on OpenAI CLI, which adds the model as a parameter to OpenAI’s ‘Completions API’, and then you can use the model even on their ‘Playground’. For example, if you fine-tuned the model for generating medical reports based on input data, you can use it to generate medical reports.

Fine-tuning a pre-trained GPT model involves preparing the data, selecting a pre-trained GPT model based on the task at hand, and then fine-tuning the model on the collected data for the specific task to improve its performance.

Power of Large Language Models

Large Language Models (LLMs) have recently become the centre of attention in natural language processing (NLP). The use of LLMs has significantly impacted various fields, including education, healthcare, finance, and communication. Below are some of the applications of LLMs, and they provide insight into how these models can be anticipated to be used in the future.

  1. Automated Writing Assistance: Large language models can be used to assist writers in generating content for their writing by suggesting words, phrases, and even entire sentences. This assistance could benefit non-native speakers or people with writer's block. They can even generate feedback using these models.
  2. Improved Chatbots and Virtual Assistants: Large language models can help improve chatbots and virtual assistants by allowing them to understand better and respond to natural language queries. So your Siri or Alexa will work even better now.
  3. Personalized Content Creation: By analyzing data on a user’s preferences and behaviour, large language models can generate personalized content such as news articles, social media posts, and product recommendations. This way, the users can focus on their choice of content from the enormous data on the internet.
  4. Advanced Natural Language Processing: Large language models can improve natural languages processing tasks such as sentiment analysis, entity recognition, and question answering. Large language models can learn to identify sentiment by analyzing the context and emotional cues present in the text. Large language models can be trained to recognize entities by analyzing the patterns and relationships between words in the text. Large language models can improve the accuracy and efficiency of question-answering systems by learning from large amounts of text and understanding the relationships between words and concepts in the text.
  5. Medical Diagnosis and Treatment: Large language models can be trained on vast amounts of medical data to aid in diagnosing and treating diseases. This application could lead to more accurate diagnoses and personalized treatment plans, making research work much more manageable. This will also aid rural areas and areas with limited access to healthcare.
  6. Automated Translation: Large language models can help improve automated translation services by better understanding the nuances of different languages and providing more accurate translations, helping to solve language barrier problems.
  7. Improving Accessibility: Large language models can help improve accessibility for people with disabilities by providing real-time captioning, sign language interpretation, and other language-related services.
  8. Predictive Analytics: Large language models can analyze vast amounts of textual data, such as new articles and social media posts, to identify ongoing trends and make predictions in various fields, including finance, politics, and marketing. This application is beneficial for Businesses focused on marketing.

Limitations of LLM

  • Language Limitations: Large Language Models are trained on excessive amounts of data, but this data is often limited to specific languages, typically English. This can make it difficult for these models to perform well on tasks related to other languages or dialects that have less available training data. Researchers are upgrading these models' performance on multilingual tasks by incorporating more diverse data sources and using techniques such as cross-lingual transfer learning.
  • Bias: Large Language Models are also known to exhibit bias, which can result from the training data used. Biases in language models can perpetuate stereotypes and other forms of discrimination, which can have real-world consequences. Researchers are exploring ways to mitigate this issue, such as developing methods to detect and correct bias in language models.
  • Computationally Intensive: Large Language Models require significant computational resources to train and execute operations, which can limit their accessibility and sustainability. Researchers are exploring ways to make these models more efficient, such as developing model compression and quantization techniques.
  • Token Limit- One of the significant limitations of large language models is the number of tokens they can handle in a single input prompt. Most large language models have a fixed limit on the number of tokens they can process, which can restrict the complexity of prompts or questions that can be asked. This token limit can impact the quality of responses and limit the models' ability to understand and generate nuanced or lengthy text. This can be a significant issue in fields like scientific research, where complex prompts with multiple variables are required. Researchers are exploring various approaches to address this issue, such as hierarchical models, chunking of prompts, and generating relevant sub-prompts based on the initial prompt. These approaches aim to divide a lengthy prompt into smaller, more manageable parts that can be processed by the model separately, then combined to generate a final response. However, these methods are still in the research phase and require further development and testing to be applied in practical applications.
  • Lack of Common Sense: Large Language Models still need to gain common sense reasoning abilities despite their impressive capabilities. This means they may need help with tasks that require understanding everyday knowledge and context, such as answering questions about the real world. Researchers are working on incorporating more common-sense knowledge into these models, such as using external knowledge graphs or ontologies.
  • Ethical Concerns: Using Large Language Models raises ethical concerns, such as potential misuse or unintended consequences. For instance, someone might use these models to generate a good spam text for sending spam or learning about similar unethical techniques. Researchers and practitioners are working to develop ethical guidelines and frameworks for developing and deploying these models and address issues such as privacy, transparency, and accountability.
  • Limited Explainability: Large Language Models can be difficult to understand or explain, making it challenging to interpret how they arrive at their results or diagnose performance issues. Researchers are exploring ways to improve the interpretability of these models, such as developing methods for visualizing and explaining their inner workings.

Maintaining a large language model can be expensive, both in terms of hardware resources and financial costs. These models require significant amounts of RAM and memory to store their massive parameters and data, which can be difficult for organizations with limited resources. The cost of regularly maintaining and updating these models can also add up, as it requires a team of experts to certify that the models are up-to-date and performing optimally. This cost can affect the efficiency and quality of the results produced by a large language model. For example, a model running on inadequate hardware may need help to process complex prompts, leading to slow response times and lower-quality results. Organizations must consider the cost of maintaining and upgrading large language models when deploying them to ensure they can maintain their efficiency and quality over time.

Future of Large Language Models

The future of large language models is promising as they continue improving in accuracy, speed, and efficiency. One significant development in the future of large language models is their ability to understand better and generate more complex human-like language. This means that these models will be able to perform more advanced natural language processing tasks, such as understanding sarcasm and idioms. Another critical development is the ability to train large language models on multiple languages simultaneously, enabling them to understand and process multiple languages efficiently. In addition, large language models will become more specialized in their applications, with more models being explicitly trained for specific industries or tasks, such as medical diagnosis or financial analysis. As technology advances, large language models will likely become even more efficient, enabling them to process vast amounts of data in real time. Overall, the future of large language models is bright, and we can expect these models to continue revolutionizing natural language processing in the coming years.

Some Popular Large Language Models

  • ChatGPT- ChatGPT is a state-of-the-art language model developed by OpenAI. It is a type of artificial intelligence that can understand natural language and generate responses that are similar to human conversation. With a vast amount of training data and sophisticated algorithms,  ChatGPT can generate coherent and relevant responses to a wide range of prompts. It is an impressive tool that has revolutionized the field of natural language processing. The responses generated by ChatGPT are often accurate, informative, and even entertaining at times. However, like any other language model, ChatGPT is not perfect and can sometimes produce irrelevant or nonsensical responses. Nonetheless, it remains one of the most advanced language models available today. To access ChatGPT, you can visit ‘https://www. chat.openai.com’. You will see a text box where you can enter your message or question, and the model will process your input and generate a response. Below is a screenshot of the website, the highlighted part is from where you get the API key. Please note that you can only copy the key once, so make sure you copy it somewhere safely before closing the tab.

  • Llama- Llama is designed to generate coherent and fluent text in response to a given prompt or task. The model has been pre-trained on a large corpus of text data, and it can be fine-tuned to adapt to a specific task or domain. Llama has been used by various organizations and research institutions to generate text for a wide range of applications, including text summarization, machine translation, and language generation. One notable example is OpenAI, which has used Llama for its GPT-3-powered natural language processing models. To access Llama, visit ‘https://huggingface.co/chat’ and interact with the AI bot.

DEV Section


GPT-3.5-turbo is an advanced version of the GPT-3 language model that has been trained on a massive amount of data to generate human-like text. It is an AI-powered language model that can perform a wide range of natural language processing tasks, such as language translation, question answering, and content creation. It is one of the most advanced language models available in the market. Its performance is significantly better than its predecessor, GPT-3, in terms of speed, accuracy, and output quality. GPT-3.5-turbo has been praised for its ability to generate high-quality text that is almost indistinguishable from human-generated text.

1. First, you must sign up for an API key at the OpenAI website. Once you have created an account, you navigate to the API page and generate your API key.

2. Install the openai package in your Python environment using pip. You do this by executing the command pip install openai in your terminal or command prompt.

3. Once you have installed the package, you use the openai module in your Python code to interact with gpt-3.5-turbo. Here is a simple example:

openai api fine_tunes.create -t <TRAIN_FILE_ID_OR_PATH> \
-m <MODEL>

In this example, we first import the openai module and set our API key. Then, we use the ChatCompletion.create() method to generate a response from the gpt-3.5-turbo model. In this case, the message parameter is a list containing a single dictionary object with two key-value pairs: role and content. Since gpt-3.5-turbo is a chat based LLM the role key is set to "system" to indicate that the input message is from the system, and the content key contains the actual input message, which is "Hi, How are you?". The input message is then fed into the model parameter, whichprocesses it and generates the response.

4. Finally, we print the generated response by accessing the first choice's text attribute in the response object's choices list. Some functions that might help you interact with the model more efficiently

This function generates a response from the model when it is given a prompt value of String type, so you need to create a prompt with string value and pass it to this function, you can even just print the result instead of returning it every time. A small example of the working of this function is also shown below:

import os
import openai
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())

openai.api_key  = 'Your_API_Key'

def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
    return response.choices[0].message["content"]

One thing to note is the ‘message’ parameter in the openai.ChatCompletion.create function takes a list of dictionaries as input, so if you directly pass a list of dictionary as input you can remove the 2nd line from the function. An example for this is shown below

  {"role": "system", "content": "You are a friendly chatbot."},
  {"role": "user", "content": "tell me a joke"},
  {"role": "assistant", "content": "Why did the chicken cross the road"},
  {"role": "user", "content": "I don't know"}

Lastly, this function below can allow you to automate the collection of user prompts and assistant responses, allowing you to interact with the chatbot without having to recall the function every time. An example of how to use the function is shared below

import panel as pn  # GUI

panels = [] # collect display 

context = [ {'role':'system', 'content':"""Hello"""} ]  # accumulate messages

inp = pn.widgets.TextInput(value="Hi", placeholder='Enter Your Query here')
button_conversation = pn.widgets.Button(name="Lets Chat")

interactive_conversation = pn.bind(collect_messages, button_conversation)

dashboard = pn.Column(
    pn.panel(interactive_conversation, loading_indicator=True, height=300),

In conclusion, this guide has traversed through the world of Large Language Models (LLMs) starting from the basics to real-life applications, and using them within a programming language. We've delved into the theory behind LLMs, their historical context, specifics of fine-tuning, and a detailed overview of their practical use cases. We've explored the nuances of popular models, paving the way for our future exploration of Bart, Llama, and GPT-J in programming languages. Yet, our journey isn't over. In the coming Advanced Developer Section, we will look into Langchain, Lambda, Pinecone, and other innovative tools and frameworks. It's an exciting era of technological advancements and we look forward to journeying further with you, revealing the transformative potential of AI in the realm of programming.

Acknowledgment: This guide was skillfully crafted with the help of Saud M.

Join Our Newsletter

Stay informed with the latest in AI research, updates, and insights directly to your inbox

Subscribe Now

More our similar blogs

You might also like

November 28, 2023

Using Gen AI to reduce reliance on human labers


Sam Naji, Joseph Tekriti
November 25, 2023

Is That Picture Real?


Sam Naji, Joseph Tekriti
November 24, 2023

Advanced Prompting Frameworks


Sam Naji, Joseph Tekriti