The quanteda.llm package makes it easy to use LLMs with quanteda corpora (or character vectors and data frames), to enable classification, summarisation, scoring, and analysis of documents and text. quanteda provides a host of convenient functions for managing, manipulating, and describing corpora as well as linking their document variables and metadata to these documents. quanteda.llm makes it convenient to link these to LLMs for analysing or classifying these texts, creating new variables from what is created by the LLMs.
Included functions
The package includes the following functions:
-
ai_text()
:- A generic function that can be used with any LLM supported by
ellmer
. - Generates structured responses or classifications based on pre-defined instructions for texts in a
quanteda corpus
. - Users can flexibly define prompts and structure of responses via
type_object()
from theellmer
package. - Users can add a dataset with examples to improve LLM performance (few-shot prompting)
- Supports resuming interrupted processes in a
result_env
environment.
- A generic function that can be used with any LLM supported by
-
ai_validate()
:- Starts an interactive app to manually validate the LLM-generated outputs.
- Allows users to review and validate the LLM-generated outputs and justifications, marking them as valid or invalid.
- Supports resuming the validation process in case of interruptions in a
result_env
environment.
-
ai_summary()
:- A wrapper around
ai_text()
for summarizing documents in a corpus. - Uses a pre-defined
type_object()
to structure the summary output.
- A wrapper around
-
ai_salience()
:- A wrapper around
ai_text()
for computing salience scores for topics in a corpus. - Uses a pre-defined
type_object()
to structure the salience classification output.
- A wrapper around
-
ai_score()
:- A wrapper around
ai_text()
for scoring documents based on a scale defined by a prompt. - Uses a pre-defined
type_object()
to structure the scoring output.
- A wrapper around
Supported LLMs
The package supports all LLMs currently available with the ellmer
package, including:
- Anthropic’s Claude:
chat_anthropic()
. - AWS Bedrock:
chat_aws_bedrock()
. - Azure OpenAI:
chat_azure_openai()
. - Cloudflare:
chat_cloudflare()
. - Databricks:
chat_databricks()
. - DeepSeek:
chat_deepseek()
. - GitHub model marketplace:
chat_github()
. - Google Gemini/Vertex AI:
chat_google_gemini()
,chat_google_vertex()
. - Groq:
chat_groq()
. - Hugging Face:
chat_huggingface()
. - Mistral:
chat_mistral()
. - Ollama:
chat_ollama()
. - OpenAI:
chat_openai()
. - OpenRouter:
chat_openrouter()
. - perplexity.ai:
chat_perplexity()
. - Snowflake Cortex:
chat_snowflake()
andchat_cortex_analyst()
. - VLLM:
chat_vllm()
.
For authentication and usage of each of these LLMs, please refer to the respective ellmer
documentation here. For example, to use the chat_openai
models, you would need to sign up for an API key from OpenAI which you can save in your .Renviron
file as OPENAI_API_KEY
. To use the chat_ollama
models, first download and install Ollama. Then install some models either from the command line (e.g. with ollama pull llama3.1) or within R using the rollama
package. The Ollama app must be running for the models to be used.
Installation
You can install the development version of quanteda.llm from https://github.com/quanteda/quanteda.llm with: