The quanteda.llm package makes it easy to use LLMs with quanteda corpora (or character vectors and data frames), to enable classification, summarisation, scoring, and analysis of documents and text. quanteda provides a host of convenient functions for managing, manipulating, and describing corpora as well as linking their document variables and metadata to these documents. quanteda.llm makes it convenient to link these to LLMs for analysing or classifying these texts, creating new variables from what is created by the LLMs.

Included functions

The package includes the following functions:

ai_text():
- A generic function that can be used with any LLM supported by ellmer.
- Generates structured responses or classifications based on pre-defined instructions for texts in a quanteda corpus.
- Users can flexibly define prompts and structure of responses via type_object() from the ellmer package.
- Users can add a dataset with examples to improve LLM performance (few-shot prompting)
- Supports resuming interrupted processes in a result_env environment.
ai_validate():
- Starts an interactive app to manually validate the LLM-generated outputs.
- Allows users to review and validate the LLM-generated outputs and justifications, marking them as valid or invalid.
- Supports resuming the validation process in case of interruptions in a result_env environment.
ai_summary():
- A wrapper around ai_text() for summarizing documents in a corpus.
- Uses a pre-defined type_object() to structure the summary output.
ai_salience():
- A wrapper around ai_text() for computing salience scores for topics in a corpus.
- Uses a pre-defined type_object() to structure the salience classification output.
ai_score():
- A wrapper around ai_text() for scoring documents based on a scale defined by a prompt.
- Uses a pre-defined type_object() to structure the scoring output.

Supported LLMs

The package supports all LLMs currently available with the ellmer package, including:

Anthropic’s Claude: chat_anthropic().
AWS Bedrock: chat_aws_bedrock().
Azure OpenAI: chat_azure_openai().
Cloudflare: chat_cloudflare().
Databricks: chat_databricks().
DeepSeek: chat_deepseek().
GitHub model marketplace: chat_github().
Google Gemini/Vertex AI: chat_google_gemini(), chat_google_vertex().
Groq: chat_groq().
Hugging Face: chat_huggingface().
Mistral: chat_mistral().
Ollama: chat_ollama().
OpenAI: chat_openai().
OpenRouter: chat_openrouter().
perplexity.ai: chat_perplexity().
Snowflake Cortex: chat_snowflake() and chat_cortex_analyst().
VLLM: chat_vllm().

For authentication and usage of each of these LLMs, please refer to the respective ellmer documentation here. For example, to use the chat_openai models, you would need to sign up for an API key from OpenAI which you can save in your .Renviron file as OPENAI_API_KEY. To use the chat_ollama models, first download and install Ollama. Then install some models either from the command line (e.g. with ollama pull llama3.1) or within R using the rollama package. The Ollama app must be running for the models to be used.

Installation

You can install the development version of quanteda.llm from https://github.com/quanteda/quanteda.llm with:

# install.packages("pak")
pak::pak("quanteda/quanteda.llm")
pak::pak("quanteda/quanteda.tidy")

Example use

To learn more about how to use the package, please refer to the following examples: