Examining Indian Logical Traditions and AI Developments Ramesh Subramanian
©IIMA, Inc. 2021 21 Communications of the IIMA
There are multiple families of generative AI models, such as Diffusion models,
Variational Autoencoders (VAEs), and Generative Adversarial Networks (GANs)
(AltexSoft, 2022). By combining two or more of these models, it has been
demonstrated that better (i.e. more human-like) results can be produced. Large
“foundation models” are built using trillions of human-generated data, with
hundreds of contextual parameters that are then trained and retrained until they
reach accuracy levels and performances that are close to human accuracy and
performance (AltexSoft, 2022),(McKinsey, 2023). Thus there is constant learning
and re-learning within these models. Examples of some foundation models are
GPT, Llama, BLOOM, FLAN-T5, BERT, etc (Slashdot Media, 2023).
Interaction with these models typically involve:
• A Prompt (such as a question like “What is Titan?”), which is fed into
• A Model (such as the ones above), which processes the question and outputs
• A Completion, which is the response (such as “Titan is the largest moon of
Saturn…”)
This example uses a large language model (LLM). Generative AI applications
using LLMs can be used to write essays, summarize text, translate sentences to
other languages, translate text to machine code, extract information given the names
of people, etc. The important point here is the training of these LLMs. That process
involves trillions of data, each with billions of parameters which represent contexts,
collected and trained over several months. The training typically uses a model
known as the “Transformer Model,” introduced by Google engineers in 2017
(Vaswani et al., 2017a). The model basically “learns” the strength of relationships
between word-pairs in texts, using the concept of “attention weights,” as well as
positional encoding of the words. In this manner, the model is able to predict the
“next word,” given a prompt.
A more detailed discussion of the LLM models and the Transformer architecture is
outside the scope of this paper. For interested readers, a good place to start is the
2017 seminal paper “Attention is all you need” by Ashish Vaswani et al (Vaswani
et al., 2017b). However, the important takeaways from this discussion on
Generative AI are the following:
• The “next word” that is predicted by the models is dependent on the context,
and does not follow any hierarchical process of deduction
• This context is determined by how “close” a word is to other words in
ndimensional space (i.e. with numerous parameters)
• Very large amounts of data, with large numbers of parameters are required for
building and training these models