Understanding Text Generation: From Markov Chains to GPT
In this blog post, I’ll be sharing what I’ve learned about text generation. It’s been quite a journey of exploration for me, as I’ve dived into the world of AI and machine learning. I wanted to understand how computers can generate text, so I did some digging. Along the way, I’ve come across different methods and models, and I’m excited to share what I’ve discovered with you. We’ll start by talking about something called Markov Chains and then move on to more advanced stuff like OpenAI’s GPT. Together, let’s uncover how computers can create text and why it’s so important in today’s world of technology.
What are Markov Chains?
Markov Chains are mathematical systems that model transitions from one state to another within a sequence of events. Named after the Russian mathematician Andrey Markov, these chains are characterized by the Markov property, which states that the probability of transitioning to the next state depends only on the current state and not on the sequence of events leading to it.
Understanding Markov Chain Text Generation:
In text generation, Markov Chains are used to model the probability of transitioning from one word (or token) to another based on a given corpus of text. The basic idea is to construct a dictionary where each word serves as a key, and the associated value is a list of words that often follow the key word in the training text. When generating text, we start with a seed word, look up its associated list of possible next words, and randomly select one of them to continue the sequence. By iteratively selecting words based on their probabilities in the Markov Chain, we can generate new text that mimics the style and patterns of the training data.
For instance, considering a given text where each word is treated as a token, all potential subsequent words are recorded for each token.
Text Example: “I like to play soccer and I would like to sing a song.”
For the token “I”, the potential next words would be [“like”, “would”].
For the token “like”, it would be [“to”].
For “to”, the possibilities would be [“play”, “sing”], and so forth…
The resulting array would resemble this structure, indicating that if the input token is “I,” there’s a 50% probability that the next word would be “like” and a 50% probability for “would” in the generated text.
let inputText = 'I like to play soccer and I would like to sing a song.';
let chain; // After Mapping output will be as below
// chain: {
"I": ["like", "would"],
"like": ["to"],
"to": ["play","sing"],
"play": ["soccer"],
"soccer": ["and"],
"would": ["like"],
"sing": ["a"],
"a": ["song"]
}
Sample JavaScript code that generates Text using Markov Chain:
// Define a class for Markov Chain text generation
class MarkovChain {
constructor() {
this.chain = {}; // Initialize an empty dictionary to store the Markov Chain transitions
}
// Function to train the model with sample text
train(text) {
// Split the input text into an array of words
const words = text.split(/\s+/);
// Iterate over each word in the text
for (let i = 0; i < words.length - 1; i++) {
const currentWord = words[i]; // Get the current word
const nextWord = words[i + 1]; // Get the next word
// Check if the current word exists in the chain dictionary
if (!this.chain[currentWord]) {
// If not, initialize an empty array for the current word
this.chain[currentWord] = [];
}
// Add the next word to the list of words following the current word
this.chain[currentWord].push(nextWord);
}
}
// Function to generate text
generate(seed, length = 20) {
let currentWord = seed; // Start with the seed word
let generatedText = seed; // Initialize the generated text with the seed word
// Generate text of specified length
for (let i = 0; i < length; i++) {
const nextWords = this.chain[currentWord]; // Get the list of words following the current word
// Check if there are next words available
if (!nextWords || nextWords.length === 0) {
break; // If not, stop generating text
}
// Randomly select the next word from the list of next words
const nextIndex = Math.floor(Math.random() * nextWords.length);
const nextWord = nextWords[nextIndex];
generatedText += ' ' + nextWord; // Add the next word to the generated text
currentWord = nextWord; // Update the current word for the next iteration
}
return generatedText; // Return the generated text
}
}
// Example usage
const textData = `
FROM off a hill whose concave womb reworded
A plaintful story from a sistering vale,
My spirits to attend this double voice accorded,
And down I laid to list the sad-tuned tale;
Ere long espied a fickle maid full pale,
Tearing of papers, breaking rings a-twain,
Storming her world with sorrow's wind and rain.
Upon her head a platted hive of straw,
Which fortified her visage from the sun,
Whereon the thought might think sometime it saw
The carcass of beauty spent and done:
Time had not scythed all that youth begun,
Nor youth all quit; but, spite of heaven's fell rage,
Some beauty peep'd through lattice of sear'd age.
`;
// Create an instance of the MarkovChain class
const markovModel = new MarkovChain();
// Train the model with the sample text data
markovModel.train(textData);
// Generate text starting with the seed word 'Upon' and a length of 10 words
const generatedText = markovModel.generate('Upon', 10);
console.log(generatedText); // Output the generated text
- The
generate
method starts with the providedseed
word. - It then iterates for a specified
length
, generating the next word based on the Markov chain model. - For each iteration, it looks up the
currentWord
in the Markov chain's dictionary to find the possible next words. - If there are no next words or the
currentWord
is not present in the dictionary, it breaks out of the loop. - If there are next words available, it randomly selects one of them to append to the generated text.
- It updates the
currentWord
to the newly generated word and continues the process until the desired length is reached.
This code creates a Markov Chain model and trains it with some sample text. Then, it generates new text based on a starting word. It’s a basic example, but it gives you an idea of how Markov Chains work in practice.
Moving to Advanced Models:
Now that we’ve got a grasp on basic text generation using Markov Chains, let’s talk about more advanced models like OpenAI’s GPT. These models are built on sophisticated neural network architectures, such as transformers, capable of capturing intricate patterns and long-range dependencies in text and have been trained on tons of text from the internet. They use fancy math and algorithms to figure out not just what word comes next, but what the whole sentence should be!
Conclusion:
Text generation is a fascinating field that combines linguistics, statistics, and computer science. From simple models like Markov Chains to cutting-edge technologies like GPT, we’ve seen how computers can generate text with remarkable accuracy. As we continue to explore and develop these models, the possibilities for AI-driven text generation are endless. Whether it’s for chatbots, content creation, or language translation, text generation plays a crucial role in shaping the future of technology. I hope this blog has given you some insight into this exciting area of AI and machine learning. Thanks for joining me on this journey!
Reference Links
A Guide to Markov Chain and its Applications in Machine Learning
https://www.linkedin.com/learning/hands-on-ai-build-a-generative-language-model-from-scratch