cross-posted from: https://lemmit.online/post/4242386
This is an automated archive made by the Lemmit Bot.
The original was posted on /r/pcmasterrace by /u/trander6face on 2024-10-24 11:11:47+00:00.
You can use it for blurring your background on your camera
ah that’s good
You can already do that with the lens though?
If you have a professional DSLR sure. Not a webcam.
I would have preferred if they had used the die space for the GPU, not this bullshit.
But then, how would ai save our future?
try to use it with llama cpp if you folks are interested in runinng locall llms - https://github.com/ggml-org/llama.cpp/issues/9181
the issue is closed, but that is not because it is solved, check it out, and find link to your relevant hardware (amd or intel or something else), and see if your particular piece is available. if so, you have hope.
in case it is not, try to find first party stuff (intel vino or intel one or amd rocm stack, and use that with transformers python or see if vllm has support).
also, try to check r/localllama on the forbidden website for your particular hardware - there is likely someone who has done something with it.
So, what? Is that only trained on ethical data?
Because i bloody doubt itIt’s not a specific model, its a harness for running models. You can find ethically trained models to run on it though iirc.
pretty much this. I use smolllm (a 3B param model, trained only on openly available datasets)
I have heard of ollama before, is this the same thing?
llama.cpp is not the same thing as ollama. It does a similar thing, but better imo. can multiplex/multithread sessions/conversations with llms (where ollama and lmstudio etc. have to queue them up), stuff like that.
further clarification - ollama is a distribution of llama cpp (and it is a bit commercial in some sense). basically, in ye olde days of 2023-24 (decades in llm space as they say), llama cpp was a server/cli only thing. it would provide output in terminal (that is how i used to use it back then), or via a api (an openai compatible one, so if you used openai stuff before, you can easily swap over), and many people wanted a gui interface (a web based chat interface), so ollama back then was a wrapper around llama cpp (there were several others, but ollama was relatively main stream). then as time progressed ollama “allegedly enshittified”, while llama cpp kept getting features (a web ui, ability to swap models during run time(back then that required a separate llama-swap), etc. also llama cpp stack is a bit “lighter” (not really, they both are web tech, so as light as js can get), and first party(ish - the interface was done by community, but it is still the same git repo) so more and more local llama folk kept switching to llama cpp only setup (you could use llama cpp with ollama, but at that point, ollama was just a web ui, and not a great one, some people prefered comfyui, etc). some old timers (like me) never even tried ollama, as plain llama.cpp was sufficient for us.
as the above commenter said, you can do very fancy things with llama cpp (the best thing about llama cpp is that it works with both cpu and gpu - you can use both simulataneously, as opposed to vllm or transformers, where you almost always need gpu. this simultaneous thing is called offloading. where some of the layers are dumped in system meory as opposed to vram, hence the vram poor population used ram )(this also kinda led to ram inflation, but do not blame llama cpp for it, blame people), and you can do some of them on ollama (as ollama is wrapper around llama cpp), but that requires ollama folks to keep their fork up to date to parent, as well as expose the said features in ui.
I checked, ollama does not use npu
I got an Asus Vivobook with a pretty new AMD chipset that has one… running Fedora, no Idea how to make something use it either :/
Also on an Asus laptop with Fedora, after following some obscure instructions from amd I managed to get a Python script to confirm the npu exists and is technically functional… But apparently to do anything approaching useful, I need the slightly fancier npu. What a waste.
I wonder if they have implemented NPU support into Folding@Home yet.
There are no plans for it, at least not for a while. Apparently F@H workloads are not suitable for the current NPUs
They need hardware that does actual useful work, not hardware accelerated autocorrect.
You can do lots of useful work with processors designed for large matrix calculus, but not a lot of folks want to run physics sims for some reason.
Sounds like it should do well with some advanced 3D stuff? Or wonky physics games like Goat Simulator or Amazing Frog
If there are game devs that want to try making a game out of a fluid sim, I’d be all for it. Realistically, most devs will have to wait for better middleware that adds NPU matrix acceleration to existing game engines.
You need a model compiled for the architecture. I saw some for the RK35xx devices when shopping for hardware. I do not think there is software made to split up or run models in general on a NPU. The models must be configured for the physical hardware topology. The stuff that runs on most devices is very small, and these either need a ton of custom fine tuning or they are barely capable of simple tasks.
On the other hand, segmentation models are small, and that makes layers, object identification, and background removal stuff work. Looking at your CPU speed, and available memory, it is unlikely to make much difference. You are also memory constrained for running models, though you could use deepspeed to load from a disk drive too.
The only thing I can think of is AI video game upscalers. Other than that, yeah, it’s a waste of silicon.
which software exactly (that uses npu instead of gpu)?
I don’t think there is one currently, but that’s a potential use.
Co-pilot for example and other LLM types of neural network processing not tied to Nvidia.
https://github.com/FastFlowLM/FastFlowLM
That RTX 4050 is probably even faster though.











