The ai_salience()
function allows you to classify
documents based on their relevance to predefined topics. The function
uses a predefined type_object
argument from
ellmer
to structure the LLM’s response, producing a list of
topics and their salience scores for each document. This function is
particularly useful for analysing large corpora where manual
classification would be impractical. Users need to provide a character
vector of documents and a list of topics to classify. The LLM will then
analyse each document and assign a salience score to each topic,
indicating how relevant the document is to that topic.
Loading packages and data
## Package version: 4.3.1
## Unicode version: 15.1
## ICU version: 74.2
## Parallel computing: disabled
## See https://quanteda.io for tutorials and examples.
## Loading required package: ellmer
data_corpus_inaugural <- data_corpus_inaugural[57:60]
Using ai_salience()
for salience rating of topics
# define the topics for salience classification
topics <- c("economy", "environment", "healthcare")
result <- data_corpus_inaugural %>%
ai_salience(topics, chat_fn = chat_openai, model = "gpt-4o",
api_args = list(temperature = 0, seed = 42))
## Using `chat_fn()` with model "gpt-4o"
## ■■■■■■■■■ 1/4 | 25% | ETA: 25s | 2013-Obama
##
## ■■■■■■■■■■■■■■■■■■■■■■■ 3/4 | 75% | ETA: 4s | 2021-Biden
##
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 4/4 | 100% | ETA: 0s | 2025-Trump
##
##
##
## ✔ Processed 4 documents successfully
result
id | salience_economy | salience_environment | salience_healthcare |
---|---|---|---|
2013-Obama | 0.4 | 0.3 | 0.3 |
2017-Trump | 0.6 | 0.1 | 0.3 |
2021-Biden | 0.2 | 0.3 | 0.5 |
2025-Trump | 0.5 | 0.2 | 0.3 |