5,2 min to read

GPT Language Models deployed via Azure AI

SoftwareOne blog editorial team
Blog Editorial Team
$name

The spectacular successes of AI applications such as ChatGPT are based on high-performance language models, known as large language models (LLMs). In this article, we explain how LLMs work, how the GPT models were developed – and how you can make the most of their capabilities. 

What are language models?

Generative AI language models, i.e. AI models that can not only process (“understand”) language but also generate it, are the result of NLP (Natural Language Processing) research on how natural language is processed by machines. After all, it is our languages that allow us to communicate most efficiently – not only with one another, but also with machines (e.g. Google Assistant or Alexa).

The most powerful AI language models today are called large language models (LLMs). LLMs are large-scale deep learning models based on deep neural networks, often comprising millions or even billions of parameters (adjustable weights of nodes or “neurons”). Besides, they have been trained on large amounts of text data, mostly from the internet. By analysing this data base, they can capture and statistically model complex patterns that reflect syntactic language structures, but also semantic relationships between words. LLMs can thus perform a wide range of linguistic tasks, from text analysis (categorisation, sentiment analysis) to summaries and translations to the generation of product texts or blog articles. At first glance, their output is often indistinguishable from that of human authors.

GPT-X: The Transformer Language Models from OpenAI

Among the best-known LLMs are the GPT models from the US company OpenAI – the technology behind the popular chatbot ChatGPT. (You can find more information about OpenAI in this blog post.)

“GPT” stands for “Generative Pre-Trained Transformer”, which explains the principle: the models achieve their high performance and versatility through extensive pre-training using the Transformer architecture.

The Transformer principle for deep learning of LLMs (or, more generally, of models that process sequences) was introduced by Google researchers in 2017. The core concept behind it is called “Attention” and is based on the attention mechanism in cognitive systems. Thanks to the attention mechanism, Transformers outperform previous models and, unlike them, do not require complex and resource-intensive network architectures such as RNNs (Recurrent Neural Networks) or CNNs (Convolutional Neural Networks). 

This architecture has revolutionised NLP research and is used in the training of many modern language models. The first Pre-trained Transformer model was OpenAI’s GPT (2018). It combined the Transformer architecture with a semi-supervised training approach: unsupervised pre-training followed by fine-tuning with prepared (labelled) data for specific tasks. This principle made it easier to adapt the model to other, not specifically trained, tasks.

The successors to the GPT series were, above all, bigger and bigger. GPT-2 of 2019 had ten times more parameters than GPT-1 (1.5 billion) and was trained with 40 gigabytes of data (8 million web pages); GPT-3 (2020) had an incredible 175 billion parameters and 570 gigabytes of training data. As it turned out, this scaling enabled some sensational performances in an increasing variety of tasks. 

Language and code

GPT-3 not only produced almost flawless texts (linguistically speaking – the AI models are still not perfect regarding content). With appropriate training, the model can also handle programme code. In 2021, the first GPT-3 models specialised in code appeared, called “Codex”. Codex has been widely criticised, among other things for serious bugs. OpenAI has marked the Codex models as deprecated since March 2023 and refers to its new chat models, which are said to do similar things without special training (more on this in a moment).

Microsoft – OpenAI’s largest funder – exclusively licensed GPT-3 in 2020 and has since used GPT-X in several of its products, including its Copilot AI assistant. One of the first was the Github Copilot based on Codex in 2021. It translates tasks given in natural language into code and is designed to make developers’ jobs easier and faster. The new Github Copilot X with GPT-4 has been available since March 2023.

Ever more versatile: InstructGPT, ChatGPT and GPT-4

The GPT-3 models were trained to complete user input (prompts), making them cumbersome to use and their results often unsatisfactory. Instead, OpenAI’s successor models, called InstructGPT, are trained to implement instructions using human feedback. Even the first generation of InstructGPT, GPT-3.5, released in early 2022, produced more relevant output with fewer errors using significantly fewer parameters. In addition, OpenAI now subjects its models to an alignment process after training to promote factual fidelity and “desired behaviour”.

In late 2022, ChatGPT, a version of GPT 3.5 optimised for chat, was released. Two additional chat-enabled models followed in March 2023, GPT-3.5-Turbo and GPT-4, both of which are highly efficient task-independent learners, capable of mastering new tasks without much training. GPT-4 shows, human-level performance in various examinations, according to OpenAI. It is also designed as a “multimodal” model, capable of handling both text and image input – but this feature is not yet available (as of August 2023). 

In addition to GPT-X, OpenAI offers a number of other AI models. Of particular interest to businesses are Whisper (speech recognition), DALL·E (image generation from text descriptions) and CLIP (trained simultaneously with image data and associated descriptions).

Azure OpenAI Service 

If you want to use OpenAI’s innovative AI models productively, take a look at the Azure OpenAI Service. Azure OpenAI is part of Azure AI – Microsoft’s portfolio of cloud services for AI applications, which offers infrastructure services for machine learning and deployment as well as numerous developer tools and pre-trained AI models for various tasks. 

Microsoft Azure customers can exclusively use the current OpenAI models, including ChatGPT and GPT-4, as well as LLaMA by Meta since July 2023 – and, as usual with Azure, in a manner optimised for enterprise use with extensive compliance, privacy and security features. In the blog post mentioned “Azure OpenAI Service: Background, Capabilities, and Advantages over OpenAI”, we also explain why enterprise customers should choose the Azure OpenAI service over OpenAI itself for productive AI applications. 

A blue and purple background with waves on it.

Find out how to successfully use AI in your organisation!

By attending our SoftwareOne Internal Knowledge Management & AI Workshop, you will gain the knowledge and strategic insight you need to successfully deploy AI in your organisation. Together, we will explore the fundamentals of big data and AI, and provide you with concrete steps and best practices for using AI as a valuable tool to improve business performance.

Find out how to successfully use AI in your organisation!

By attending our SoftwareOne Internal Knowledge Management & AI Workshop, you will gain the knowledge and strategic insight you need to successfully deploy AI in your organisation. Together, we will explore the fundamentals of big data and AI, and provide you with concrete steps and best practices for using AI as a valuable tool to improve business performance.

Author

SoftwareOne blog editorial team

Blog Editorial Team

We analyse the latest IT trends and industry-relevant innovations to keep you up-to-date with the latest technology.