Example: Structuring LLM responses for text analysis

The package allows you to structure the responses from LLMs in a way that is compatible with quanteda’s corpus principles and useful for common text analysis tasks. This means you can easily integrate LLM-generated data into your text analysis workflows. For example, you can ask an LLM to summarize all documents in a corpus (ai_summary()) and store the summaries as document variables, or you can classify documents into topics (ai_salience()) or scale them based on predefined criteria (ai_scale()) and store the results as document variables.

If you need more flexibility in how the LLM generates its output, you can use the ai_text() function to define custom prompts and response structures. With ai_text() and the help of the type_object() argument from the ellmer package, you can define how the LLM should format its output, such as specifying the fields to include in the response or the format of the response itself. This flexibility enables you to tailor the LLM’s output to your analysis requirements, making it easier to integrate LLM-generated data into your text analysis workflows.

Loading packages and data

library(quanteda)

## Package version: 4.3.1
## Unicode version: 15.1
## ICU version: 74.2

## Parallel computing: disabled

## See https://quanteda.io for tutorials and examples.

library(quanteda.llm)

## Loading required package: ellmer

data_corpus_inaugural <- data_corpus_inaugural[57:60]

Using `ai_text()` for scoring documents

prompt <- "Score the following document on a scale of how much it aligns
with the political left. The political left is defined as groups which
advocate for social equality, government intervention in the economy,
and progressive policies. Use the following metrics:
SCORING METRIC:
3 : extremely left
2 : very left
1 : slightly left
0 : not at all left"

# define the structure of the response
policy_scores <- type_object(
  score = type_integer(),
  evidence = type_string()
)

result <- ai_text(data_corpus_inaugural, chat_fn = chat_openai, model = "gpt-4o", 
                  type_object = policy_scores,
                  system_prompt = prompt,
                  api_args = list(temperature = 0, seed = 42))

## Using `chat_openai()` with model "gpt-4o"
## ■                                0/4 |   0% | ETA: ? | NA
## 
## ■                                0/4 |   0% | ETA: ? | 2013-Obama
## 
## ■■■■■■■■■                        1/4 |  25% | ETA: 12s | 2013-Obama
## 
## ■■■■■■■■■                        1/4 |  25% | ETA: 12s | 2017-Trump
## 
## ■■■■■■■■■■■■■■■■                 2/4 |  50% | ETA:  6s | 2017-Trump
## 
## ■■■■■■■■■■■■■■■■                 2/4 |  50% | ETA:  6s | 2021-Biden
## 
## ■■■■■■■■■■■■■■■■■■■■■■■          3/4 |  75% | ETA:  3s | 2021-Biden
## 
## ■■■■■■■■■■■■■■■■■■■■■■■          3/4 |  75% | ETA:  3s | 2025-Trump
## 
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■  4/4 | 100% | ETA:  0s | 2025-Trump
## 
## 
## 
## ✔ Returned 4 documents (4 successful, 0 with NAs)

# score and evidence are created as new docvars in the corpus
library(kableExtra)
  result %>%
  kable("html", escape = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  column_spec(3)

id	score	evidence
2013-Obama	2	The speech emphasizes themes that align with the political left, such as social equality, government intervention, and progressive policies. It advocates for collective action to address modern challenges, supports the idea of a rising middle class, and stresses the importance of social safety nets like Medicare and Social Security. The speech also highlights the need for climate change action and sustainable energy, which are typically left-leaning priorities. Additionally, it calls for equal rights for women and the LGBTQ+ community, and for immigration reform, all of which are progressive issues. However, it also acknowledges skepticism of central authority and the importance of personal responsibility, which slightly tempers the alignment with the extreme left.
2017-Trump	0	The speech emphasizes nationalism, protectionism, and a focus on American interests, which are not typically aligned with the political left. It criticizes the political establishment and globalism, and promotes an ‘America First’ agenda. The speech does not advocate for social equality, government intervention in the economy, or progressive policies, which are key tenets of the political left. Instead, it focuses on reducing foreign influence and prioritizing American workers and industries, which aligns more with right-wing populism.
2021-Biden	2	The speech emphasizes themes of unity, democracy, and addressing systemic issues such as racial justice, climate change, and economic inequality, which align with progressive policies typically associated with the political left. The call for government intervention to address the pandemic, economic challenges, and racial justice further supports a left-leaning perspective. However, the speech also focuses heavily on unity and bipartisanship, which tempers the alignment with the political left, as it seeks to appeal to a broader audience beyond just left-leaning individuals.
2025-Trump	0	The speech emphasizes nationalism, border security, and a strong military, which are typically associated with right-wing politics. It criticizes government intervention in the economy, such as the Green New Deal and electric vehicle mandates, and promotes energy independence through drilling, which aligns with conservative economic policies. The speech also opposes government censorship and promotes a colorblind, merit-based society, which are not aligned with progressive policies. Overall, the speech does not advocate for social equality, government intervention in the economy, or progressive policies, which are key tenets of the political left.

Loading packages and data

Using ai_text() for scoring documents

Using `ai_text()` for scoring documents