BishopPhillips
BishopPhillips
BPC Home BPC AI Topic Home BPC RiskManager BPC SurveyManager BPC RiskWiki Learn HTML 5 and CSS Enquiry

LARGE LANGUAGE MODELS - The State Of The Art

An Introduction.
AI Revolution

As AI technology continues to evolve, language generation models have emerged as a promising area of research. These models are designed to generate human-like text based on a given prompt. They have the potential to revolutionize the way we interact with machines and automate many tasks that previously required human intervention.

One of the most significant advancements in this field is the development of large language models (LLMs). These models are trained on vast amounts of text data and can generate coherent and fluent text that is often indistinguishable from human writing. LLMs can be used for a wide range of applications, including chatbots, search engines, summarization tools, and even code generation.

However, LLMs are not without their challenges. They can sometimes produce biased or problematic outputs due to the data they are trained on. Additionally, there are concerns about the environmental impact of training these models, which require massive amounts of computational resources.

Despite these challenges, many new solutions have appeared on the internet that use prompts to create an output. One such example is ChatGPT, a chatbot developed by OpenAI that uses the GPT3 (and now GPT4) LLM to generate responses to user queries. Another example is AI Dungeon, an interactive fiction game that uses LLMs to generate storylines based on user inputs.

One point of confusion with LLMs is the tendency of the public to confuse the LLM with the application wrapping it. You will often hear ChatGPT and GPT3 used interchangeably in discussions, but these are actually distinct systems - like a finance system and the database engine on which it runs. ChatGPT is a chatbot that uses GPT3+ as its "database". As such ChatGPT is intended to be publicaly facing and has filters and constraints builtin and uses (at the time of writing) about 20 billion parameters versus GPT's 175+ billion parameters. BingAI Chat is similarly a chat bot that wraps GPT with a conversational interface.

LLMs are semantic probability engines that generate text based on the likelihood of that text appearing next to previously generated sequence of text or in response to provided sequence of text in the form of a prompt.  They are essentially probablistic and non-deterministic.  LLM's do not have a notion of "understanding" or knowledge per-se treating a body of data as a recommendation engine for how elements of that data contained within relate to other elements of the data contained within that body of knowledge.  They are generating text that satisfies statistical consistency with the prompt provided.

Surprisingly, this characteristic makes LLM's better at creating (where factual linking is not required) than interpreting where accuracy and correctness is paramount.  Unlike earlier solutions, like expert systems, the probabilistic nature of LLM's means they are weaker when it comes to the goals of correctness and accuracy.  When an LLM deviates from reality by producing a convincing answer to a question that is not grounded in known fact, we call that "hallucination".  The better the training set the less likely such hallucinations are to occur, but "better" is a subjective concept and requires more consideration.  As in the concept of Total Quality Management, where quality was not defined in absolute terms, but rather "fit for purpose" terms, the quality of the training set is determined by how fit for the targeted purpose it is. Typically deficiencies in the training set fall into one or more of the following categories:

  • Overfitting: Overfitting occurs when the data is too narrowly defined and the model is overtrained on that narrow data set. It is the typical "you don't know what you don't know" problem of "if all you have is a hammer than every screw is a nail".  Overfitting is the result of the neural network learning a non-diverse data set and applying it to every problem encountered.  In the early days of character recognition networks it arose when a network as trained on perfectly formed and positioned letters making it incapable of recognising slightly imperfect letters or oddly positioned letters.  The solution was to introduce errors into the training set - badly formed letters and a wider range of letters.  Of course it is a balancing act as too much noise in the dataset prevents the network from reliably distinguishing one letter from another. An overfitted network prevents the model from generalising well to new data.
  • Data Errors:  Data that is too noisy, or mislabeled, or miss classified will obviously lead to the model simply learning the wrong thing. If I teach the model the a cat is a fish, I cannot expect it to recognise that a cat is not a fish when I show it a fish.  Aside from the obvious problem of mislabeling, bias in the data set leads to a similar problem, in that if the model learns one view of the world much more strongly than another view of the world it will tend to interpret what it sees on a probabilistic basis as belonging to the view of the world that has dominated in its training set.  Further as the model has no "instinct" for right or wrong, not sense of credible versus incredible, a set of lies learned is a truth as much as a set of truths.  In a sense this is no different from a human in that we grow and mature within a culture that holds certain values as important and as adults we continue to interpret the world in terms of those cultural values.  It does not make us right, and they could be complete hallucinations of reality, but they are truths to us and those with whom we share a common culture.
  • Data sparsity: The wider the data domain, the more likely any subdomain will be sparsely covered.  Where data is missing or not sufficiently detailed, the model is likely to invent what is required using other data to bridge the gap.  This is actually a desired behaviour in many cases, but it can lead to hallucination when the bridging performed is in fact in error.  Clearly the solution is refreshing the training dataset and filling in holes in the knowledge, but this may be more easily said than done as identifying that holes exist can of itself be a problem.

These systems demonstrate the potential of LLMs for creating engaging and interactive experiences for users. However, they also highlight the need for responsible development and deployment of these technologies. As AI technology continues to advance, it is essential that we consider the ethical implications of these systems and ensure that they are developed in a way that benefits society as a whole.

Language generation models represent an exciting area of research in AI technology. With the development of large language models and new solutions appearing on the internet that use prompts to create an output, we are seeing new possibilities for automating tasks and creating engaging experiences for users. However, we must also be mindful of the challenges associated with these technologies and work towards responsible development and deployment.


...Next: Overview of current LLM Solutions....