OpenAssistant, an open source AI bot

OpenAssistant

Open Assistant is a project aimed at giving everyone access to a great chat-based large language model.

Recently the LAION community (Large-Scale Artificial Intelligence Open Network) unveiled through an announcement the first release of the «OpenAssistant» project, which develops an artificial intelligence chatbot capable of understanding and answering questions in natural language, interacting with third-party systems and dynamically extracting the necessary information.

For those unfamiliar with LAION, you should know that it develops tools, models and data collections to create free machine learning systems (for example, the LAION collection is used to train models of the Stable Diffusion image synthesis system).

Addition code to train and organize work of the bot on your computer, it is proposed to use a collection of ready-made models to use already trained and a language model, trained on the basis of 600 thousand examples of dialogues in the form of a request-response (instruction-execution), prepared and revised with the participation of a community of enthusiasts.

An online service to assess the quality of the chatbot was also launched, using the OA_SFT_Llama_30B_6 knowledge model, which covers 30 billion parameters.

Our team has worked tirelessly over the past several months collecting vast amounts of information and text-based feedback to create an incredibly diverse and unique dataset specifically designed for training language models or other AI applications.

With over 600 human-generated data points covering a wide range of topics and writing styles, our data set will prove to be an invaluable tool for any developer looking to create next-generation instructional models.

To increase efficiency of the system and avoid the need to store large amounts of preset parameters, the project foresees the possibility of using a dynamically updated knowledge base that can retrieve the required information through search engines or external services.

For example, when generating responses, the bot can access external APIs to get additional data. Of the advanced features, personalization support is also highlighted, that is, the ability to adapt to a specific user based on their previous phrases.

For those interested in installing OpenAssistant, you should know that you can install it locally, and that candidate Pythia SFT models are available from HuggingFace and can be loaded via the HuggingFace Transformers library. As such, it is possible that they can be used with sufficient hardware. There are also spaces on HF that can be used to chat with the OA candidate without your own hardware. However, these models are not definitive and may lead to poor or unwanted results.

LLaMa SFT models cannot be released directly due to the Meta license, but XOR weights will be released soon.

It is important to mention that the current smallest model (Pythia) has 12B parameters and is difficult to run on consumer hardware, but can run on a single professional GPU. There may be smaller models in the future, and we hope to advance methods like integer quantization that can help run the model on smaller hardware.

The project does not plan to stop at repeating the capabilities of ChatGPT. Open-Assistant is expected to stimulate the development of open development in the field of content generation and query processing in natural languages, just as the open source project Stable Diffusion stimulated the development of image generation tools.

The project code is written in Python and is distributed under the Apache 2.0 license. OpenAssistant developments can be used to create your own intelligent assistants and dialog systems that are not tied to external APIs and services. Conventional consumer hardware is enough to work, for example, it is possible to work on a smartphone. The Open Assistant data is released under a Creative Commons license which allows for a wide range of uses, including commercial use.

Finally, if you are interested in being able to learn more about it as well as being able to consult the source code, you can consult the details In the following link.


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: AB Internet Networks 2008 SL
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.