Imagine having a transcendent assistant always with you, capable of reading and translating hundreds of languages, screening important news, drafting emails and documents, reminding and suggesting things to you, teaching you new skills, and even providing clever answers with a voice you love. If this assistant were human, it would be almost impossible, but with artificial intelligence (AI), it's no longer a distant fiction. In fact, it's already becoming a reality in this decade, thanks to the AI revolution.
Not long ago, you and I might have been skeptical or never even considered the possibility of such an assistant. But with the emergence of ChatGPT and similar AI software, our doubts have been shattered, and we're witnessing a new era of technological advancement.
What is ChatGPT?
It is virtual assistant software that was announced by OpenAI Company at the end of 2022 (see https://chat.openai.com/ or you can download the application for your phone), and it is causing a large wave in the world. The press daily does not know how many ink paper to write about it. After only two months from the announcement, ChatGPT has reached 100 million users, which is a record ever for software services.
The principle of using ChatGPT is very simple. After logging in, you can write any questions or comments in Vietnamese, English, or any other common language, and ChatGPT will provide an answer in the language you use. In many cases, the answer given by ChatGPT is quite reasonable and useful. It is so useful that programmers who need to write a small piece of code can also ask GPT to write it for them. Doctors also ask ChatGPT to provide explanations of diseases, and so on. There have even been many students cheating in exams using ChatGPT to write essays that they should have done themselves, leading to universities having to set up measures to detect this. It would not be surprising if there were doctoral theses written by ChatGPT or similar virtual assistants!
However, you must be very cautious when using ChatGPT and similar virtual assistants, because they can fabricate answers that "seem to be right" but actually talk nonsense. For example, when asked "Give the list of scientific articles by the author XYZ," ChatGPT immediately gave article names that were completely fabricated. That's because in fact, ChatGPT does not have the names of those articles in its memory. It has only seen the names of articles during the training process, and its memory is not full of article names but just phrases. It then combined these phrases to fabricate names that look like real articles!
Another example is when asked about the severity of a skin disease, ChatGPT answered that people calculate it by measuring the severity of the disease over four parts of the body: the body, body, limbs, and nails, then combine them together. This is a fabricated answer (but those who do not know may believe it is real) because the actual four parts are the head, body, limb (hand), and lower limb (foot), not the foundation. The virtual assistant has somehow mixed up the formula, perhaps because it has repeatedly seen the word "nail" in relation to this skin disease.
According to Microsoft's assessment, the company that invested billions of dollars in OpenAI, the response rate of ChatGPT is only about 70%. Jean-Noel Barrot, the Minister of Communication and Digital Conversion in France, is not wrong when he says that ChatGPT "is just a parrot." However, these "parrots" are also very formidable because they have the ability to learn new structures and information very quickly, becoming increasingly sophisticated and smart. From GPT-2 (the second generation of ChatGPT that appeared in 2019) to GPT-3 (the current generation of ChatGPT), there has been a huge leap in the quality of answers provided. The upcoming GPT-4 is expected to be even better.
A little history:
The history of virtual assistants began in the 1960s when Professor Weizenbaum at MIT University developed a chatbot (dialogue software) named Eliza. Eliza's algorithm was only capable of simple tasks such as repeating the user's input and using sentence structures stored in its memory. It had no understanding of the meaning behind the text. Eliza was primarily used for "private conversations" between Weizenbaum and his secretary.
Since then, the revolution in machine learning and computing power, especially in natural language processing, has completely changed the face of virtual assistants.
An important development in natural language processing by artificial intelligence was the Word2VEC method that appeared in 2013. It allows for converting words into vectors (numbers with given lengths or points in multi-dimensional Euclidean space). This method makes it more convenient for computers to work with digital sets than with words. The relative location between the vectors reflects the relationship between the corresponding words, such as (men) - (women) giving a vector similar to (Prince) - (Princess). Calculating with these vectors helps to identify relationships between words in a sentence.
Before this, in the 1980s, a type of artificial neural network called RNN (Recurrent Neural Network) was proposed by Rumelhart Psychology in the US. It imagined text or sound segments as a series of time in which new words or sounds appear gradually. The RNN network processes each word or sound as it appears and remembers the previous words or sounds. In 1997, two German professors named Hochreiter and Schmidhuber made an important improvement to RNN called LSTM (Long Short-Term Memory), which remembers the "echo" of words that appeared previously longer in the text. Combining Word2VEC with LSTM has produced the best natural language processing tools (chatbots, text translation machines, etc.) in the 2010s.
Transformer for Virtual Assistants
The history of virtual assistants dates back to the 1960s when Professor Weizenbaum at the University of MIT developed a chatbot (dialogue software) named Eliza. However, the emergence of machine learning and natural language processing technologies revolutionized the field of virtual assistants. An important step in natural language processing was the development of the Word2VEC method in 2013, which allows the conversion of words into numbers or vectors, making it easier for computers to process language. Another important development was the Recurrent Neural Network (RNN) proposed by Rumelhart in the 1980s, which handles text or sound as a series of time and keeps a little memory of the previous processing words. In 1997, Hochreiter and Schmidhuber made an important improvement to the RNN called Long Short-Term Memory (LSTM) which handles the "echo" of words that appeared previously longer in the text. The Word2VEC method combined with LSTM produced some of the best natural language processing tools, including chatbots and text translation machines.
In 2017, researchers at Google and the University of Toronto introduced a new artificial neural structure called Transformer. This structure was the basis for the next generation of natural language processing technologies, such as BERT, GPT, and Facebook's Bart. GPT, the platform technology of ChatGPT, stands for Generative Pretrained Transformer. This technology has a creative proliferation function and has been trained on a huge dataset to remember many important concepts or structures before further training for specific fields.
The main difference between the Transformer structure and RNN/LSTM lies in the fact that RNN/LSTM processes words sequentially, while Transformer processes them in parallel. Additionally, Transformer calculates Attention and Self-Attention coefficients to determine which words are more important in the context of a paragraph, something RNN/LSTM do not do. This approach allows for more accurate grammar structure descriptions, better understanding of relationships between words, and more accurate translations.
Virtual assistants based on the Transformer structure not only learn how to answer questions correctly, but they are also capable of personalization and adapting to different speaking styles, emotional states, and other variables. Although initially developed to handle natural language, the Transformer idea, particularly the idea of Attention, has also been used effectively for processing images and other types of information and signals. There is a comprehensive list of AI software using the Transformer structure here: https://huggingface.co/docs/transformers/index.
The New AI Race
According to Professor Oded Netzer, Deputy Head of the Department of Columbia Business School, "the world is no longer the same as before" after ChatGPT appeared. Many businesses and industries "lagging in artificial intelligence" have been threatened with extinction in the near future, while many new wealth opportunities have emerged, with a total value of up to trillions of dollars. Even "giants" like Google feel threatened to lose their dominance in the information search market to ChatGPT, to the point of urgently changing strategy and investing in painting solutions.
OpenAI, the company behind ChatGPT, is a new company established in 2015 but has received over a billion dollars in investment from the start and has Elon Musk, the world's richest technology billionaire, among its founders. Musk named the company "Open" to signify its purpose of creating open-source AI software to serve humanity. In addition to ChatGPT, OpenAI has other famous AI projects, such as DALL-E 2, which allows users to generate corresponding images for sentences they input. However, Musk has stated that Microsoft is increasingly acquiring OpenAI to serve its own dominance in the market and make money, departing from the original "open software" idea.
In essence, GPT's algorithm ideas and natural language processing software are not difficult, and anyone can copy them to their computer, make some adjustments, and let it learn on their data set to create a virtual assistant specializing in certain fields or a universal virtual assistant. The difficulty lies in achieving large-scale and high efficiency. Artificial neurological models for large-scale natural language processing (LLM - Large Language Model) currently have hundreds of billions of variables, and upcoming models from "giants" will have trillions of variables. For comparison, popular photo processing models currently have fewer variables than 1/1000 of these language models.
It is estimated that at least $50 million in computer resources are required to compute machine learning to create software like ChatGPT. However, this cost is not a barrier for large companies and countries, and many will participate in the virtual AI assistant race. This is because whoever controls the information holds the power. If a country does not control any virtual assistant, its history may even be rewritten by others through the virtual assistants used by the world.
Author of this article is Professor Nguyen Tien Dung.
Translated by Mina Tran