Llama CPP
Only available on Node.js.
This module is based on the node-llama-cpp Node.js bindings for llama.cpp, allowing you to work with a locally running LLM. This allows you to work with a much smaller quantized model capable of running on a laptop environment, ideal for testing and scratch padding ideas without running up a bill!
Setup
You'll need to install the node-llama-cpp module to communicate with your local model.
- npm
- Yarn
- pnpm
npm install -S node-llama-cpp @langchain/community
yarn add node-llama-cpp @langchain/community
pnpm add node-llama-cpp @langchain/community
You will also need a local Llama 2 model (or a model supported by node-llama-cpp). You will need to pass the path to this model to the LlamaCpp module as a part of the parameters (see example).
Out-of-the-box node-llama-cpp
is tuned for running on a MacOS platform with support for the Metal GPU of Apple M-series of processors. If you need to turn this off or need support for the CUDA architecture then refer to the documentation at node-llama-cpp.
For advice on getting and preparing llama2
see the documentation for the LLM version of this module.
A note to LangChain.js contributors: if you want to run the tests associated with this module you will need to put the path to your local model in the environment variable LLAMA_PATH
.
Usage
Basic use
In this case we pass in a prompt wrapped as a message and expect a response.
import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
import { HumanMessage } from "@langchain/core/messages";
const llamaPath = "/Replace/with/path/to/your/model/gguf-llama2-q4_0.bin";
const model = new ChatLlamaCpp({ modelPath: llamaPath });
const response = await model.invoke([
new HumanMessage({ content: "My name is John." }),
]);
console.log({ response });
/*
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hello John.',
additional_kwargs: {}
},
lc_namespace: [ 'langchain', 'schema' ],
content: 'Hello John.',
name: undefined,
additional_kwargs: {}
}
*/
API Reference:
- ChatLlamaCpp from
@langchain/community/chat_models/llama_cpp
- HumanMessage from
@langchain/core/messages
System messages
We can also provide a system message, note that with the llama_cpp
module a system message will cause the creation of a new session.
import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
import { SystemMessage, HumanMessage } from "@langchain/core/messages";
const llamaPath = "/Replace/with/path/to/your/model/gguf-llama2-q4_0.bin";
const model = new ChatLlamaCpp({ modelPath: llamaPath });
const response = await model.invoke([
new SystemMessage(
"You are a pirate, responses must be very verbose and in pirate dialect, add 'Arr, m'hearty!' to each sentence."
),
new HumanMessage("Tell me where Llamas come from?"),
]);
console.log({ response });
/*
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: "Arr, m'hearty! Llamas come from the land of Peru.",
additional_kwargs: {}
},
lc_namespace: [ 'langchain', 'schema' ],
content: "Arr, m'hearty! Llamas come from the land of Peru.",
name: undefined,
additional_kwargs: {}
}
*/
API Reference:
- ChatLlamaCpp from
@langchain/community/chat_models/llama_cpp
- SystemMessage from
@langchain/core/messages
- HumanMessage from
@langchain/core/messages
Chains
This module can also be used with chains, note that using more complex chains will require suitably powerful version of llama2
such as the 70B version.
import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
import { LLMChain } from "langchain/chains";
import { PromptTemplate } from "@langchain/core/prompts";
const llamaPath = "/Replace/with/path/to/your/model/gguf-llama2-q4_0.bin";
const model = new ChatLlamaCpp({ modelPath: llamaPath, temperature: 0.5 });
const prompt = PromptTemplate.fromTemplate(
"What is a good name for a company that makes {product}?"
);
const chain = new LLMChain({ llm: model, prompt });
const response = await chain.invoke({ product: "colorful socks" });
console.log({ response });
/*
{
text: `I'm not sure what you mean by "colorful socks" but here are some ideas:\n` +
'\n' +
'- Sock-it to me!\n' +
'- Socks Away\n' +
'- Fancy Footwear'
}
*/
API Reference:
- ChatLlamaCpp from
@langchain/community/chat_models/llama_cpp
- LLMChain from
langchain/chains
- PromptTemplate from
@langchain/core/prompts
Streaming
We can also stream with Llama CPP, this can be using a raw 'single prompt' string:
import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
const llamaPath = "/Replace/with/path/to/your/model/gguf-llama2-q4_0.bin";
const model = new ChatLlamaCpp({ modelPath: llamaPath, temperature: 0.7 });
const stream = await model.stream("Tell me a short story about a happy Llama.");
for await (const chunk of stream) {
console.log(chunk.content);
}
/*
Once
upon
a
time
,
in
a
green
and
sunny
field
...
*/
API Reference:
- ChatLlamaCpp from
@langchain/community/chat_models/llama_cpp
Or you can provide multiple messages, note that this takes the input and then submits a Llama2 formatted prompt to the model.
import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
import { SystemMessage, HumanMessage } from "@langchain/core/messages";
const llamaPath = "/Replace/with/path/to/your/model/gguf-llama2-q4_0.bin";
const llamaCpp = new ChatLlamaCpp({ modelPath: llamaPath, temperature: 0.7 });
const stream = await llamaCpp.stream([
new SystemMessage(
"You are a pirate, responses must be very verbose and in pirate dialect."
),
new HumanMessage("Tell me about Llamas?"),
]);
for await (const chunk of stream) {
console.log(chunk.content);
}
/*
Ar
rr
r
,
me
heart
y
!
Ye
be
ask
in
'
about
llam
as
,
e
h
?
...
*/
API Reference:
- ChatLlamaCpp from
@langchain/community/chat_models/llama_cpp
- SystemMessage from
@langchain/core/messages
- HumanMessage from
@langchain/core/messages
Using the invoke
method, we can also achieve stream generation, and use signal
to abort the generation.
import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
import { SystemMessage, HumanMessage } from "@langchain/core/messages";
const llamaPath = "/Replace/with/path/to/your/model/gguf-llama2-q4_0.bin";
const model = new ChatLlamaCpp({ modelPath: llamaPath, temperature: 0.7 });
const controller = new AbortController();
setTimeout(() => {
controller.abort();
console.log("Aborted");
}, 5000);
await model.invoke(
[
new SystemMessage(
"You are a pirate, responses must be very verbose and in pirate dialect."
),
new HumanMessage("Tell me about Llamas?"),
],
{
signal: controller.signal,
callbacks: [
{
handleLLMNewToken(token) {
console.log(token);
},
},
],
}
);
/*
Once
upon
a
time
,
in
a
green
and
sunny
field
...
Aborted
AbortError
*/
API Reference:
- ChatLlamaCpp from
@langchain/community/chat_models/llama_cpp
- SystemMessage from
@langchain/core/messages
- HumanMessage from
@langchain/core/messages
Related
- Chat model conceptual guide
- Chat model how-to guides