small chat feature #11

Open
opened 2024-08-12 18:40:09 +01:00 by not-nullptr · 6 comments
not-nullptr commented 2024-08-12 18:40:09 +01:00 (Migrated from github.com)

would be super cute honestly !! it already has a tuned in counter and a websocket server running already. would be happy to implement this if i knew it was gonna be merged :3

would be super cute honestly !! it already has a tuned in counter and a websocket server running already. would be happy to implement this if i knew it was gonna be merged :3
kennethnym commented 2024-08-12 21:09:13 +01:00 (Migrated from github.com)

that sounds like a really cool idea!! my only worry is that it's gonna be ruined by spam and the likes :(

that sounds like a really cool idea!! my only worry is that it's gonna be ruined by spam and the likes :(
not-nullptr commented 2024-08-12 21:14:58 +01:00 (Migrated from github.com)

hmm.. well it's already running inference for MusicLM which is notoriously hard to run already. does the server have enough VRAM left for a small LLM for sentiment anaylsis, maybe gemma2:2b which is insanely small for the quality? https://ollama.com/library/gemma2:2b

hmm.. well it's already running inference for MusicLM which is notoriously hard to run already. does the server have enough VRAM left for a small LLM for sentiment anaylsis, maybe `gemma2:2b` which is insanely small for the quality? https://ollama.com/library/gemma2:2b
kennethnym commented 2024-08-12 23:28:29 +01:00 (Migrated from github.com)

i have 6gb of vram left, should be able to run a small LLM? but the gpu is basically always at 100% usage due to it constantly churning out new clips, so i don't know if it can handle another llm. i definitely CANNOT afford to spin up another gpu 😭

Screenshot 2024-08-12 at 23 26 40
i have 6gb of vram left, should be able to run a small LLM? but the gpu is basically always at 100% usage due to it constantly churning out new clips, so i don't know if it can handle another llm. i definitely CANNOT afford to spin up another gpu 😭 <img width="88" alt="Screenshot 2024-08-12 at 23 26 40" src="https://github.com/user-attachments/assets/17214759-eb6b-4378-aae4-ed4b3ff034d4">
not-nullptr commented 2024-08-12 23:32:26 +01:00 (Migrated from github.com)

i've just looked into it, running an LLM probably isn't worth it. after some local testing they're way too overbearing to censor regular conversation. might just be worth having an IP rate limit

i've just looked into it, running an LLM probably isn't worth it. after some local testing they're way too overbearing to censor regular conversation. might just be worth having an IP rate limit
kennethnym commented 2024-08-13 11:56:43 +01:00 (Migrated from github.com)

hmm i see, i will keep this opened for now, i do want to implement this in v2, but main focus of v2 right now is a fine-tuned model + dynamic prompt generation for more variety. thank u for this cute suggestion though!

feel free to drop a PR if u want, but again the model goes first, i will review it once the model is trained :)

hmm i see, i will keep this opened for now, i do want to implement this in v2, but main focus of v2 right now is a fine-tuned model + dynamic prompt generation for more variety. thank u for this cute suggestion though! _feel free to drop a PR if u want, but again the model goes first, i will review it once the model is trained :)_
aryanranderiya commented 2024-10-14 06:25:32 +01:00 (Migrated from github.com)

Hey you could look into Cloudflare Workers AI. You get like a 100k requests a day for free and maybe it's something that'll work out of the box? Not for fine-tuning tho

Hey you could look into Cloudflare Workers AI. You get like a 100k requests a day for free and maybe it's something that'll work out of the box? Not for fine-tuning tho
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: kennethnym/infinifi#11
No description provided.