18. Related Topics

This chapter delves into the evolution of the Transformer model, exploring its potential applications beyond natural language processing.

18.1. Large Language Models (LLMs)

LLMs, a type of AI trained on massive amounts of text data, learn the patterns of human language and generate human-quality text. They have already caused a significant impact on society.

According to the paper “Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond”, three main streams of LLM technology trends exist:

  • Encoder-Only
  • Encoder-Decoder
  • Decoder-Only

See the following figures cited from the paper (by J. Yang, et al.) and article (by Boston Consulting Group):

Refer to the original figure for details.

Nearly all of these trends are rooted in the Transformer model1. For example, BERT leverages only the encoder component of the Transformer model, while GPT2 solely employs the decoder component.

Currently, the most powerful models are commercially developed. However, numerous open-source models have also been released. Additionally, several open-source models, e.g. ollama, are available that can be run on personal computers or other consumer hardware.

18.1.1. MultiModal Large Language Models

Research publications on multimodal LLMs have seen a significant surge since 2023.

The following figure is cited from the survey:

Reference

18.1.2. Retrieval-Augmented Generation (RAG) for LLMs

RAG is a technique used to enhance the quality and reliability of text generated by LLMs by incorporating factual information retrieved from external knowledge sources.

The following figure is cited from the survey:

Reference

18.1.3. LLM based Multi-Agents

As LLMs evolved, systems have been proposed in which multiple LLMs work together to complete tasks, rather than a single LLM performing all task processing. A major approach is multi-agent based LLMs.

Early LLM-based multi-agent systems were straightforward: a coordinator managed multiple expert agents who could answer questions within their specific domains.

However, LLM-based multi-agent systems are rapidly being improved. For example, multiple agents (LLMs) are now able to debate together to resolve given tasks.

Moreover, to improve the discussion processes, reinforcement learning techniques are leveraged, just as reinforcement learning surprisingly improved machine game-playing, its various techniques are be applied to improve the discussion skills of agents in LLM-based multi-agent systems.

I believe that by applying these techniques, machines can eventually acquire human-like thinking methods, such as chain-of-thought reasoning.

18.1.4. LLM for Site Reliability Engineering (SRE)

Studies on AIOps, Artificial Intelligence for IT Operations, have been conducted since the relatively early days in the IT operation field3. These studies have focused on areas such as failure management and resource provisioning, mainly using traditional machine learning methods like anomaly detection and root cause analysis.

Since 2023, the AIOps field has seen the introduction of LLM techniques.

While research with LLMs in this area is still in its early stages, I believe AI has the power to transform the field of Site Reliability Engineering (SRE) from a discipline heavily reliant on individual skills to one that incorporates more engineering and scientific principles.

18.2. Computer Vision

Similar to the shift from RNNs to Transformers in NLP, computer vision has seen a transition from convolutional neural network (CNN) architectures to transformer-based architectures with the introduction of Vision Transformer (ViT) in 2020.

18.3. Time-series Analysis

Since sentences in natural language can be viewed as a type of time series data, it is natural to consider applying the Transformer model, originally introduced for NLP, to traditional time series analysis domains.

Additionally, a paper exploring time series analysis using Large Language Models (LLMs) was published in 2024.

References

IMO: While Transformer and LLM analyses show promise, their lack of interpretability is a concern. Traditional time series analysis methods, on the other hand, provide valuable insights based on a strong mathematical foundation. In my opinion, we shouldn’t abandon traditional methods just yet.

18.4. Reinforcement Learning

The “third golden age” of AI arrived with breakthroughs like Deep Q-Networks (DQNs) conquering Atari games and AlphaGo defeating the Go world champion. These achievements popularized the term “Deep Learning.”

While these achievements are rooted in the field of reinforcement learning (RL), the impact of Transformers is starting to influence RL research as well.

Here are two notable examples:

  • Decision Transformer (2021)
    This model tackles offline RL4, where the agent cannot interact with the environment during training. It leverages Transformer architecture to analyze past experiences and make optimal decisions in unseen situations.
  • MAT: Multi-Agent Transformer (2022)
    MAT addresses the complexities of situations where multiple agents interact and compete or cooperate. This model utilizes Transformer mechanisms to capture the complex dynamics and dependencies between agents.

18.5. Transformer Alternatives

Many alternative models to the Transformer are continuously being proposed.

Here are some recent examples:

References

While these alternatives are not yet mainstream, it does not imply their inferiority to the Transformer.

Currently, research resources primarily focus on the Transformer and its variants. However, if one of these alternatives or a future model proves significantly more effective than the Transformer, it could potentially become the dominant architecture.


  1. ELMo is LSTM-based model. In commercial models, it is impossible to say definitively whether they are transformer-based, as some do not disclose details about their internal architecture. ↩︎

  2. It’s fascinating to me that GPT-2, a decoder built from Transformer modules, presents only one formula in its paper: This is a familiar representation of language models for us.
    It reveals that N-gram, RNN-based language models, and LLMs all share the foundation of statistical language models, even though LLMs appear like technology from another world or magic, much like how DNA forms the basis for all life, from bacteria to human being. ↩︎

  3. As a database engineer, I initially became interested in Deep Learning (and Reinforcement Learning) to explore the potential of applying LLMs to system administration tasks. ↩︎

  4. Online reinforcement learning agents interact with an environment in real-time. For example, in online learning, an AI agent plays the game of Go in real-time, adjusting its strategy based on the outcomes of each move during the game.
    In contrast, offline reinforcement learning uses a fixed dataset of pre-collected experiences to train the agent, such as learning from a historical dataset of Go games. This is useful for complex or costly environments where direct interaction is impractical, like optimizing industrial processes based on historical data. ↩︎