Is there any software that can use it that benefits average user or is it just a waste of silicon???

tatoko556@reddthat.com · 22 days ago

Is there any software that can use it that benefits average user or is it just a waste of silicon???

st3ph3n@midwest.social · 22 days ago

I would have preferred if they had used the die space for the GPU, not this bullshit.

Junkers_Klunker@feddit.dk · 22 days ago

But then, how would ai save our future?

alibloke@feddit.uk · 22 days ago

You can use it for blurring your background on your camera

tatoko556@reddthat.com · 22 days ago

ah that’s good

snugglesthefalse@sh.itjust.works · 22 days ago

You can already do that with the lens though?

artyom@piefed.social · 22 days ago

If you have a professional DSLR sure. Not a webcam.

Sparrow_1029@programming.dev · 22 days ago

I got an Asus Vivobook with a pretty new AMD chipset that has one… running Fedora, no Idea how to make something use it either :/

Territorial@piefed.ca · 22 days ago

Also on an Asus laptop with Fedora, after following some obscure instructions from amd I managed to get a Python script to confirm the npu exists and is technically functional… But apparently to do anything approaching useful, I need the slightly fancier npu. What a waste.

sga@piefed.social · 22 days ago

try to use it with llama cpp if you folks are interested in runinng locall llms - https://github.com/ggml-org/llama.cpp/issues/9181

the issue is closed, but that is not because it is solved, check it out, and find link to your relevant hardware (amd or intel or something else), and see if your particular piece is available. if so, you have hope.

in case it is not, try to find first party stuff (intel vino or intel one or amd rocm stack, and use that with transformers python or see if vllm has support).

also, try to check r/localllama on the forbidden website for your particular hardware - there is likely someone who has done something with it.

tatoko556@reddthat.com · 22 days ago

I have heard of ollama before, is this the same thing?

dyathinkhesaurus@lemmy.world · 22 days ago

llama.cpp is not the same thing as ollama. It does a similar thing, but better imo. can multiplex/multithread sessions/conversations with llms (where ollama and lmstudio etc. have to queue them up), stuff like that.

sga@piefed.social · 22 days ago

further clarification - ollama is a distribution of llama cpp (and it is a bit commercial in some sense). basically, in ye olde days of 2023-24 (decades in llm space as they say), llama cpp was a server/cli only thing. it would provide output in terminal (that is how i used to use it back then), or via a api (an openai compatible one, so if you used openai stuff before, you can easily swap over), and many people wanted a gui interface (a web based chat interface), so ollama back then was a wrapper around llama cpp (there were several others, but ollama was relatively main stream). then as time progressed ollama “allegedly enshittified”, while llama cpp kept getting features (a web ui, ability to swap models during run time(back then that required a separate llama-swap), etc. also llama cpp stack is a bit “lighter” (not really, they both are web tech, so as light as js can get), and first party(ish - the interface was done by community, but it is still the same git repo) so more and more local llama folk kept switching to llama cpp only setup (you could use llama cpp with ollama, but at that point, ollama was just a web ui, and not a great one, some people prefered comfyui, etc). some old timers (like me) never even tried ollama, as plain llama.cpp was sufficient for us.

as the above commenter said, you can do very fancy things with llama cpp (the best thing about llama cpp is that it works with both cpu and gpu - you can use both simulataneously, as opposed to vllm or transformers, where you almost always need gpu. this simultaneous thing is called offloading. where some of the layers are dumped in system meory as opposed to vram, hence the vram poor population used ram )(this also kinda led to ram inflation, but do not blame llama cpp for it, blame people), and you can do some of them on ollama (as ollama is wrapper around llama cpp), but that requires ollama folks to keep their fork up to date to parent, as well as expose the said features in ui.

tatoko556@reddthat.com · 22 days ago

I checked, ollama does not use npu

Swedneck@discuss.tchncs.de · 22 days ago

So, what? Is that only trained on ethical data?
Because i bloody doubt it

Hexarei@beehaw.org · 22 days ago

It’s not a specific model, its a harness for running models. You can find ethically trained models to run on it though iirc.

sga@piefed.social · 21 days ago

pretty much this. I use smolllm (a 3B param model, trained only on openly available datasets)

Hexarei@beehaw.org · 19 days ago

Is it any good? Curious to hear what you’re using it for and how it compares

sga@piefed.social · 19 days ago

i rarly use it, mostly to do sentiment/grammar analysis for some formal stuff/legalese. I kinda rarely use llms (1 or 2 times a month)(i just do not have a usecase). As for how good, tiny models are not good in general, but that is because they do not have enough knowledge to store info, so my use case often is purely language processing. though i have previously used it to do some work demo to generate structured data from unstructured data. basically if you provide info, they can perform well (so you can potentially build something to fetch web search results, feed into context, and use it(many such projects are available, basically something like perplexity but open)).

Jo Miran@lemmy.ml · 22 days ago

I wonder if they have implemented NPU support into Folding@Home yet.

JRaccoon@discuss.tchncs.de · 22 days ago

There are no plans for it, at least not for a while. Apparently F@H workloads are not suitable for the current NPUs

SapphironZA@sh.itjust.works · 22 days ago

They need hardware that does actual useful work, not hardware accelerated autocorrect.

knightly the Sneptaur@pawb.social · 22 days ago

You can do lots of useful work with processors designed for large matrix calculus, but not a lot of folks want to run physics sims for some reason.

I Cast Fist@programming.dev · 22 days ago

Sounds like it should do well with some advanced 3D stuff? Or wonky physics games like Goat Simulator or Amazing Frog

knightly the Sneptaur@pawb.social · 22 days ago

If there are game devs that want to try making a game out of a fluid sim, I’d be all for it. Realistically, most devs will have to wait for better middleware that adds NPU matrix acceleration to existing game engines.

√𝛂𝛋𝛆@piefed.world · 22 days ago

You need a model compiled for the architecture. I saw some for the RK35xx devices when shopping for hardware. I do not think there is software made to split up or run models in general on a NPU. The models must be configured for the physical hardware topology. The stuff that runs on most devices is very small, and these either need a ton of custom fine tuning or they are barely capable of simple tasks.

On the other hand, segmentation models are small, and that makes layers, object identification, and background removal stuff work. Looking at your CPU speed, and available memory, it is unlikely to make much difference. You are also memory constrained for running models, though you could use deepspeed to load from a disk drive too.

hperrin@lemmy.ca · 22 days ago

The only thing I can think of is AI video game upscalers. Other than that, yeah, it’s a waste of silicon.

tatoko556@reddthat.com · 22 days ago

which software exactly (that uses npu instead of gpu)?

hperrin@lemmy.ca · 22 days ago

I don’t think there is one currently, but that’s a potential use.

murvel@feddit.nu · 22 days ago

Co-pilot for example and other LLM types of neural network processing not tied to Nvidia.

calamityjanitor@lemmy.world · 22 days ago

https://github.com/FastFlowLM/FastFlowLM

That RTX 4050 is probably even faster though.

Is there any software that can use it that benefits average user or is it just a waste of silicon???

Is there any software that can use it that benefits average user or is it just a waste of silicon???

This is an automated archive made by the Lemmit Bot.