Llama2 Chatbot

I'll create a thread here with the source so it doesn't clutter the chat
13 Replies
Elder Millenial
Elder Millenial10mo ago
from dataclasses import dataclass

import solara
from llama_cpp import Llama, ChatCompletionRequestMessage

@solara.component
def Page():
history = solara.use_reactive([SYSTEM])
user_text = solara.use_reactive("")
assistant_stream = solara.use_reactive("")

def chat():
print(user_text.value)
if user_text.value != "":
chat_history = list(history.value)
chat_history.append({"role": "user", "content": user_text.value})
assert isinstance(history.value, list)
output = LLM.create_chat_completion(chat_history, stream=True)

for item in output:
assistant_stream.value = item["choices"][0]["text"]

chat_history.append(assistant_stream.value)

user_text.value = ""
history.value = chat_history

print(user_text.value)
solara.use_thread(chat, dependencies=[history, user_text, assistant_stream])

with solara.Column():
for value in history.value:
if value["role"] == "system":
continue

if value["role"] == "user":
with solara.Card(style={"background": "#555555"}):
solara.Markdown(value["content"])

if value["role"] == "assistant":
with solara.Card(style={"background": "#444444"}):
solara.Markdown(value["content"])

with solara.Card(style={"background": "#666666"}):
solara.InputText(
"Ask a question! (hit enter to submit)",
value=user_text.value,
on_value=user_text.set,
disabled=user_text.value != "",
)

if user_text.value != "":
solara.ProgressLinear(True)

with solara.Card(style={"background": "#444444"}):
solara.Markdown(assistant_stream.value)
from dataclasses import dataclass

import solara
from llama_cpp import Llama, ChatCompletionRequestMessage

@solara.component
def Page():
history = solara.use_reactive([SYSTEM])
user_text = solara.use_reactive("")
assistant_stream = solara.use_reactive("")

def chat():
print(user_text.value)
if user_text.value != "":
chat_history = list(history.value)
chat_history.append({"role": "user", "content": user_text.value})
assert isinstance(history.value, list)
output = LLM.create_chat_completion(chat_history, stream=True)

for item in output:
assistant_stream.value = item["choices"][0]["text"]

chat_history.append(assistant_stream.value)

user_text.value = ""
history.value = chat_history

print(user_text.value)
solara.use_thread(chat, dependencies=[history, user_text, assistant_stream])

with solara.Column():
for value in history.value:
if value["role"] == "system":
continue

if value["role"] == "user":
with solara.Card(style={"background": "#555555"}):
solara.Markdown(value["content"])

if value["role"] == "assistant":
with solara.Card(style={"background": "#444444"}):
solara.Markdown(value["content"])

with solara.Card(style={"background": "#666666"}):
solara.InputText(
"Ask a question! (hit enter to submit)",
value=user_text.value,
on_value=user_text.set,
disabled=user_text.value != "",
)

if user_text.value != "":
solara.ProgressLinear(True)

with solara.Card(style={"background": "#444444"}):
solara.Markdown(assistant_stream.value)
I had to delete a few things about the model setup because the post was too long I can share those as well.
withnail
withnail10mo ago
do you have a github repo for the model setup?
Elder Millenial
Elder Millenial10mo ago
Not yet. The model setup isn't super complicated, but you do need to request a download key from Facebook
withnail
withnail10mo ago
sure just trying to reproduce locally. so it is outputting the llm result correct? you just want the text to populate as it is generated?
MaartenBreddels
MaartenBreddels10mo ago
solara.use_thread(chat, dependencies=[history, user_text, assistant_stream])
should be
solara.use_thread(chat, dependencies=[user_text.value])
I think. because you want it to execute when the text changes (the reactive variable will not change)
solara.use_thread(chat, dependencies=[history, user_text, assistant_stream])
should be
solara.use_thread(chat, dependencies=[user_text.value])
I think. because you want it to execute when the text changes (the reactive variable will not change)
Elder Millenial
Elder Millenial10mo ago
Correct. @MaartenBreddels fix worked. I'm happy to share my process. It's just involved to setup right now because this is very WIP. Basically you need to sign up to get access to llama 2, download a couple hundred gigs of models, convert them to another format, install a few libraries from git... It's just a mess right now. I think we could probably create an example using some kind of streamlined functionality. For example, we could replicate this example by streaming converted tokens with a delay. We could be able to modify the new AI example to achieve this. It would be a proof of concept on how to replicate the openai UI that does the same thing, without having to run an actual model.
MaartenBreddels
MaartenBreddels10mo ago
why is there no delay right now?
Elder Millenial
Elder Millenial10mo ago
Ah, when I said delay, I meant add a small random delay to simulate the token generation speed of a large language model. It would be purely for visualization reasons.
MaartenBreddels
MaartenBreddels10mo ago
ah, why don't you get a delay from the model by itself then, i expect the models to be slow, but it's not?
Elder Millenial
Elder Millenial10mo ago
I think we might be talking past each other a bit haha. What I'm trying to say is that the models are fairly difficult to setup and run easily. So trying to set one up for an easy to run example might not be so easy. We could show the ability to have "real time" streaming responses from an AI model by simulating the processing delays with a random sleep. It would just be to show the ability to create an updating text output.
Elder Millenial
Elder Millenial10mo ago
Just to close the loop on my previous issue, here's a video of the final (working) solution
MaartenBreddels
MaartenBreddels10mo ago
Ah, now I understand! Yes, we could show the UI that way until it's configured correctly same with using openai, if you don't give a token, have some default reply, i like that idea are you planning to write an article on that?
Elder Millenial
Elder Millenial10mo ago
I'm not, but I'd be happy to provide an example and make a Tweet. I really don't like writing articles. I probably should do it more often.