S
Solara

general

Llama2 Chatbot

EMElder Millenial10/25/2023
I'll create a thread here with the source so it doesn't clutter the chat
from dataclasses import dataclass

import solara
from llama_cpp import Llama, ChatCompletionRequestMessage

@solara.component
def Page():
history = solara.use_reactive([SYSTEM])
user_text = solara.use_reactive("")
assistant_stream = solara.use_reactive("")

def chat():
print(user_text.value)
if user_text.value != "":
chat_history = list(history.value)
chat_history.append({"role": "user", "content": user_text.value})
assert isinstance(history.value, list)
output = LLM.create_chat_completion(chat_history, stream=True)

for item in output:
assistant_stream.value = item["choices"][0]["text"]

chat_history.append(assistant_stream.value)

user_text.value = ""
history.value = chat_history

print(user_text.value)
solara.use_thread(chat, dependencies=[history, user_text, assistant_stream])

with solara.Column():
for value in history.value:
if value["role"] == "system":
continue

if value["role"] == "user":
with solara.Card(style={"background": "#555555"}):
solara.Markdown(value["content"])

if value["role"] == "assistant":
with solara.Card(style={"background": "#444444"}):
solara.Markdown(value["content"])

with solara.Card(style={"background": "#666666"}):
solara.InputText(
"Ask a question! (hit enter to submit)",
value=user_text.value,
on_value=user_text.set,
disabled=user_text.value != "",
)

if user_text.value != "":
solara.ProgressLinear(True)

with solara.Card(style={"background": "#444444"}):
solara.Markdown(assistant_stream.value)
from dataclasses import dataclass

import solara
from llama_cpp import Llama, ChatCompletionRequestMessage

@solara.component
def Page():
history = solara.use_reactive([SYSTEM])
user_text = solara.use_reactive("")
assistant_stream = solara.use_reactive("")

def chat():
print(user_text.value)
if user_text.value != "":
chat_history = list(history.value)
chat_history.append({"role": "user", "content": user_text.value})
assert isinstance(history.value, list)
output = LLM.create_chat_completion(chat_history, stream=True)

for item in output:
assistant_stream.value = item["choices"][0]["text"]

chat_history.append(assistant_stream.value)

user_text.value = ""
history.value = chat_history

print(user_text.value)
solara.use_thread(chat, dependencies=[history, user_text, assistant_stream])

with solara.Column():
for value in history.value:
if value["role"] == "system":
continue

if value["role"] == "user":
with solara.Card(style={"background": "#555555"}):
solara.Markdown(value["content"])

if value["role"] == "assistant":
with solara.Card(style={"background": "#444444"}):
solara.Markdown(value["content"])

with solara.Card(style={"background": "#666666"}):
solara.InputText(
"Ask a question! (hit enter to submit)",
value=user_text.value,
on_value=user_text.set,
disabled=user_text.value != "",
)

if user_text.value != "":
solara.ProgressLinear(True)

with solara.Card(style={"background": "#444444"}):
solara.Markdown(assistant_stream.value)
I had to delete a few things about the model setup because the post was too long I can share those as well.
Wwithnail10/25/2023
do you have a github repo for the model setup?
EMElder Millenial10/25/2023
Not yet. The model setup isn't super complicated, but you do need to request a download key from Facebook
Wwithnail10/25/2023
sure just trying to reproduce locally. so it is outputting the llm result correct? you just want the text to populate as it is generated?
MMaartenBreddels10/25/2023
solara.use_thread(chat, dependencies=[history, user_text, assistant_stream])
should be
solara.use_thread(chat, dependencies=[user_text.value])
I think. because you want it to execute when the text changes (the reactive variable will not change)
solara.use_thread(chat, dependencies=[history, user_text, assistant_stream])
should be
solara.use_thread(chat, dependencies=[user_text.value])
I think. because you want it to execute when the text changes (the reactive variable will not change)
EMElder Millenial10/25/2023
Correct. @MaartenBreddels fix worked. I'm happy to share my process. It's just involved to setup right now because this is very WIP. Basically you need to sign up to get access to llama 2, download a couple hundred gigs of models, convert them to another format, install a few libraries from git... It's just a mess right now. I think we could probably create an example using some kind of streamlined functionality. For example, we could replicate this example by streaming converted tokens with a delay. We could be able to modify the new AI example to achieve this. It would be a proof of concept on how to replicate the openai UI that does the same thing, without having to run an actual model.
MMaartenBreddels10/25/2023
why is there no delay right now?
EMElder Millenial10/25/2023
Ah, when I said delay, I meant add a small random delay to simulate the token generation speed of a large language model. It would be purely for visualization reasons.
MMaartenBreddels10/25/2023
ah, why don't you get a delay from the model by itself then, i expect the models to be slow, but it's not?
EMElder Millenial10/25/2023
I think we might be talking past each other a bit haha. What I'm trying to say is that the models are fairly difficult to setup and run easily. So trying to set one up for an easy to run example might not be so easy. We could show the ability to have "real time" streaming responses from an AI model by simulating the processing delays with a random sleep. It would just be to show the ability to create an updating text output.
EMElder Millenial10/25/2023
Just to close the loop on my previous issue, here's a video of the final (working) solution
MMaartenBreddels10/26/2023
Ah, now I understand! Yes, we could show the UI that way until it's configured correctly same with using openai, if you don't give a token, have some default reply, i like that idea are you planning to write an article on that?
EMElder Millenial10/26/2023
I'm not, but I'd be happy to provide an example and make a Tweet. I really don't like writing articles. I probably should do it more often.

Looking for more? Join the community!

Want results from more Discord servers?
Add your server
Recommended Posts
Use_threadHello, I've been using a separate thread to run a long computation process in my application using For logged on users I want to give usersFor logged on users. I want to give users the flexibility of saving the results (basically a nestedWell I am definitely on the right trackWell, I am definitely on the right track hahaha. This is exactly the approach I took.another question i just want to combineanother question , i just want to combine cross filter with map (leafmap) however I couldn't make i2 No if you put in a top level dict not2) No, if you put in a top level dict (not reactive) in a module, every user sees the same dictionarIf I ve setup an app with multiple pagesIf I've setup an app with multiple pages using Solara's routing, is there an easy way to redirect a Thank you for your answer but nextThank you for your answer, but next question, why display(button) from widget on solara component doBlender with Solarawould be very happy to be a guinnea pigwould be very happy to be a guinnea pig 😄Which version of lab running pip installWhich version of lab? running !pip install will install it in the kernel environment, if the lab serThanks how could I programatically setThanks ! how could I programatically set theme from Python side ?it seems to be functional not sure Iit seems to be functional. not sure I understand the downside of my current approach. but fixing any3 what s the best way to implement `push3. what's the best way to implement `push notification` ? is it something like this ? and is it run 2 when I create a module level reactive2. when I create a module level reactive var like `x = solara.reactive(None)`, is it application widcouple of questions around sessioncouple of questions around session management: 1. I found Solara manages session via cookie which isI m not sure what you mean but changingI'm not exactly sure what you mean, but changing the value from the backend would look something likhey solara team looking into statehey solara team looking into state documentation. It seems solara.reactive is session based as in ifhope to know when we will get this fixedhope to know when we will get this fixed. I am developing some demo with Solara and don't want to seThe timer thread is an infinte loop ThatThe timer thread is an infinte loop. That's the only loop I know of. And the timer isn't a problem I m trying to run Solara as Embedding inI'm trying to run Solara as "Embedding in an existing Starlette application" and when I go to `/sola