[GH-ISSUE #3403] Read each row of CSV into LLM, write each response back out #27853

Closed
opened 2026-04-22 05:29:15 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @130jd on GitHub (Mar 29, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3403

I noticed some similar questions from Nov 2023 about reading a CSV in, but those pertained to analyzing the entire file at once.

I have a CSV with values in the first column, going down 10 rows. Each cell contains a question I want the LLM (local, using Ollama) to answer. I will give it few shot examples in the prompt.

I want it to process each question separately, with the instructions and few shot examples above each question. Almost like shutting it down each time, starting it back up, asking a question, then shutting it down and starting it back up, asking another...

Then for the output generated for each question, I want to write them each in an individual row of a new CSV.

It feels conceptually simple but I'm new to this and looking for as close an example that exists so I can follow along. Thanks in advance!

Originally created by @130jd on GitHub (Mar 29, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3403 I noticed some similar questions from Nov 2023 about reading a CSV in, but those pertained to analyzing the entire file at once. I have a CSV with values in the first column, going down 10 rows. Each cell contains a question I want the LLM (local, using Ollama) to answer. I will give it few shot examples in the prompt. I want it to process each question separately, with the instructions and few shot examples above each question. Almost like shutting it down each time, starting it back up, asking a question, then shutting it down and starting it back up, asking another... Then for the output generated for each question, I want to write them each in an individual row of a new CSV. It feels conceptually simple but I'm new to this and looking for as close an example that exists so I can follow along. Thanks in advance!
Author
Owner

@wangy4755 commented on GitHub (Mar 29, 2024):

I tried something similar in R (with rollama package)

long story short, I passed two columns (index and review text) in batches, (batch size=5) and provided the json format example in prompt and asked LLM to assign a sentiment score.

In my case, it works better with Mistral instruct. occasionally, the loop may still break (missed a review text, output 4 instead of 5) but rerun the same batch usually fix it.

But when I gradually increase the batch size, the error happens more often.

someone requested "Grammar" parameter for ollama, so we can specify the response format etc. until then, we can test and experiment with the prompt.

################################################################################
# Get the Survey Data
library(dplyr)
library(readr)
library(jsonlite)

################################################################################
# Data: https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews

# Load Data
source_data <- read_csv("https://raw.githubusercontent.com/AFAgarap/ecommerce-reviews-analysis/master/Womens%20Clothing%20E-Commerce%20Reviews.csv")
source_data <- as.data.frame(source_data)

glimpse(source_data)
summary(source_data)

source_data <- na.omit(source_data)
rownames(source_data) <- NULL

source_data$index<-rep(1:nrow(source_data), each=1)

################################################################################
# Subset Data for Time Saving
source_data<-source_data[1:500,]

################################################################################
# Input the Batch Size

# Number of Batch
batch_size =5
num_batch= nrow(source_data)/batch_size

# Creating a Grouping Label for Batch Processing
source_data$batch<-rep(1:num_batch, each=batch_size)


################################################################################
# Local LLM Ready
library(rollama)

# List the Local Models
list_models(server = NULL)

options(rollama_model = "mistral:instruct")

################################################################################
final_output <- data.frame(matrix(ncol = 14, nrow = 0))
colnames(final_output) <- c("...1", "Clothing ID", "Age", "Title", 
                            "Review Text", "Rating", "Recommended IND", "Positive Feedback Count",
                            "Division Name", "Division Name", "Class Name", "batch",
                            "Score", "Reasoning")


################################################################################
# Use For Loop to fit each Batch to LLM
for (i in 1 : num_batch ) {
  
  # Subset Data for Testing Purpose    
  batch_data<-source_data[source_data$batch == i, ]
  
  # set up the task + content for LLM
  task="Conduct analysis on 5 text reivews, one by one.
        there are always 5 reviews, do not miss any one. 
        The output must include: Sentiment Score, Reasoning for giving Sentiment Score.
        each review need to have one score and one reasoning.
        sentiment_score need to be a score between -1.00 and 1.00. if the review does not contain enough information to determine the sentiment, then set the score to be 0.
        reasoning, explaining how this score is calculated, in plain text only. can not contain any symbols. can not contain quotation marks. can not quote word from review text. 
          
        Output must be in JSON Format, follow below example:
          
        [
          {\"sentiment_score\": -0.60, \"reasoning\": \"Reasoning for giving Sentiment Score\"},
          {\"sentiment_score\": -0.80, \"reasoning\": \"Reasoning for giving Sentiment Score\"},
          {\"sentiment_score\":  0.70, \"reasoning\": \"Reasoning for giving Sentiment Score\"},
          {\"sentiment_score\": -0.60, \"reasoning\": \"Reasoning for giving Sentiment Score\"},
          {\"sentiment_score\": -0.60, \"reasoning\": \"Reasoning for giving Sentiment Score\"}
        ]
          
        response start with \"[\" and end with \"]\", after that, no words or symbols.
        the output must always contain 5 JSON objects, no more and no less.
          
        Here are the review"
  
  content<- batch_data[ ,c("index","Review Text")]
  
  # Merge the task and content for LLM
  q=paste(task, content, sep = ":")
  
  print(paste("Batch:",i))
  
  LLM_output<-query(q,screen=TRUE, model_params = list(temperature = 0.0, seed=47))
  
  output_df <- fromJSON(LLM_output$message$content,simplifyDataFrame=FALSE)
  
  output_df <- data.frame(t(sapply(output_df,c)))
  
  output_df$sentiment_score <- as.numeric(output_df$sentiment_score)
  
  # Append the output
  batch_output<-cbind(batch_data,output_df[ ,c("sentiment_score","reasoning")])
  final_output<-rbind(batch_output,final_output)
  
}



<!-- gh-comment-id:2026797817 --> @wangy4755 commented on GitHub (Mar 29, 2024): I tried something similar in R (with rollama package) long story short, I passed two columns (index and review text) in batches, (batch size=5) and provided the json format example in prompt and asked LLM to assign a sentiment score. In my case, it works better with Mistral instruct. occasionally, the loop may still break (missed a review text, output 4 instead of 5) but rerun the same batch usually fix it. But when I gradually increase the batch size, the error happens more often. someone requested "Grammar" parameter for ollama, so we can specify the response format etc. until then, we can test and experiment with the prompt. ``` ################################################################################ # Get the Survey Data library(dplyr) library(readr) library(jsonlite) ################################################################################ # Data: https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews # Load Data source_data <- read_csv("https://raw.githubusercontent.com/AFAgarap/ecommerce-reviews-analysis/master/Womens%20Clothing%20E-Commerce%20Reviews.csv") source_data <- as.data.frame(source_data) glimpse(source_data) summary(source_data) source_data <- na.omit(source_data) rownames(source_data) <- NULL source_data$index<-rep(1:nrow(source_data), each=1) ################################################################################ # Subset Data for Time Saving source_data<-source_data[1:500,] ################################################################################ # Input the Batch Size # Number of Batch batch_size =5 num_batch= nrow(source_data)/batch_size # Creating a Grouping Label for Batch Processing source_data$batch<-rep(1:num_batch, each=batch_size) ################################################################################ # Local LLM Ready library(rollama) # List the Local Models list_models(server = NULL) options(rollama_model = "mistral:instruct") ################################################################################ final_output <- data.frame(matrix(ncol = 14, nrow = 0)) colnames(final_output) <- c("...1", "Clothing ID", "Age", "Title", "Review Text", "Rating", "Recommended IND", "Positive Feedback Count", "Division Name", "Division Name", "Class Name", "batch", "Score", "Reasoning") ################################################################################ # Use For Loop to fit each Batch to LLM for (i in 1 : num_batch ) { # Subset Data for Testing Purpose batch_data<-source_data[source_data$batch == i, ] # set up the task + content for LLM task="Conduct analysis on 5 text reivews, one by one. there are always 5 reviews, do not miss any one. The output must include: Sentiment Score, Reasoning for giving Sentiment Score. each review need to have one score and one reasoning. sentiment_score need to be a score between -1.00 and 1.00. if the review does not contain enough information to determine the sentiment, then set the score to be 0. reasoning, explaining how this score is calculated, in plain text only. can not contain any symbols. can not contain quotation marks. can not quote word from review text. Output must be in JSON Format, follow below example: [ {\"sentiment_score\": -0.60, \"reasoning\": \"Reasoning for giving Sentiment Score\"}, {\"sentiment_score\": -0.80, \"reasoning\": \"Reasoning for giving Sentiment Score\"}, {\"sentiment_score\": 0.70, \"reasoning\": \"Reasoning for giving Sentiment Score\"}, {\"sentiment_score\": -0.60, \"reasoning\": \"Reasoning for giving Sentiment Score\"}, {\"sentiment_score\": -0.60, \"reasoning\": \"Reasoning for giving Sentiment Score\"} ] response start with \"[\" and end with \"]\", after that, no words or symbols. the output must always contain 5 JSON objects, no more and no less. Here are the review" content<- batch_data[ ,c("index","Review Text")] # Merge the task and content for LLM q=paste(task, content, sep = ":") print(paste("Batch:",i)) LLM_output<-query(q,screen=TRUE, model_params = list(temperature = 0.0, seed=47)) output_df <- fromJSON(LLM_output$message$content,simplifyDataFrame=FALSE) output_df <- data.frame(t(sapply(output_df,c))) output_df$sentiment_score <- as.numeric(output_df$sentiment_score) # Append the output batch_output<-cbind(batch_data,output_df[ ,c("sentiment_score","reasoning")]) final_output<-rbind(batch_output,final_output) } ```
Author
Owner

@130jd commented on GitHub (Mar 29, 2024):

Thank you for sharing that. I haven't used R in almost 10 years and need to relearn it. Did you consider implementing this using Python? Or was R the only/ most feasible way?

<!-- gh-comment-id:2027270779 --> @130jd commented on GitHub (Mar 29, 2024): Thank you for sharing that. I haven't used R in almost 10 years and need to relearn it. Did you consider implementing this using Python? Or was R the only/ most feasible way?
Author
Owner

@wangy4755 commented on GitHub (Mar 29, 2024):

I was inspired by this example (in Python): https://www.youtube.com/watch?v=h_GTxRFYETY&t=727s
hope this helps.

<!-- gh-comment-id:2027466473 --> @wangy4755 commented on GitHub (Mar 29, 2024): I was inspired by this example (in Python): https://www.youtube.com/watch?v=h_GTxRFYETY&t=727s hope this helps.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#27853