A space dedicated to the history of science and technology, supported by AI.
Stella space - Is a blog where Science and technology's history will be interleaved with questions about Artificial Intelligence targeted to Natural Language Processing (NLP).
The current model occupies 1.5 GB. It was trained in a RTX 2070 board with 8 GB ram from NVIDEA during 20 hours. The training was done using a data base of 33673 questions and 74195 answers during 48 epoch. Its structure has some similarities with GPT2 from OpenAI, although their model is approx. 500 MB. The ratio answers/questions is approx: 2.203
Next experiments will be made with four machines, three with RTX 2070 boards installed and one four's with a NVIDEA V100 with 24 GB, aimed to reduce training time and energy consumption, or to allow a bigger database during the same time.
The database is mainly about the life and work of STEM scientists, their problems, families, social and political context, technological limitations and how they overcome them. Many, were Nobel Prize recipients in physics, chemistry and medicine, but also from electricity, electronics and computer sciences and related areas
33673 questions were made and more then 70 K high quality answers were given.
The overall quality is very good but sometimes it makes some confusions, scientists names and facts. Later a confusion matrix must be used to classify and avoid some answers of being given. Now, the greater the sentence asked, the bigger the probability of mistakes and nonsense, more than 500 tokens must be avoided.
The model filled completely the 8 GB GPU's memory, as can be seen with nvidia-smi application.
In a near future the model will be migrated and adapter to a RTX 4090 GPU with 24 GB ram in a AMD based server with 32 GB ram.
Since the subject is very specific and most articles interconnected, the final model quality is very good. Training tests never exceeded more than a week of ruining time.
STELLA ZOE (*1) is the name of the model and GINA (*2), the engine that gives her support, is a mixed language Ubuntu's application of C++ and Python, for processing speed and developing speed respectively. Its gateway runs on a RPI4 also with Ubuntu 20.04 for PI64.
Some articles produced by Stella with my support, will be published in this blog.
We thanks OpenAI's team for all the support given and the access to its GPT 3.5's API. Without their help, this work could not have been done.
(*1) - Smart TExt Language Learning Assistant (STELLA)
(*2) - Generative Inference and Neural Architecture (GINA)
Comments
Post a Comment