Link to paper

https://arxiv.org/abs/2308.14296

Notes

The paper propose a novel algorithm, Self-Inspiring, to improve the planning ability of the LLM agent. At each intermediate planning step, the LLM “self-inspires” to consider all previously explored states to plan for next step.

  • most existing RSs such as DNN-based methods (e.g., CNN and LSTM) and pre-trained language models (e.g., BERT) cannot sufficiently capture textual knowledge about users and items due to limitations in model scale and data size
  • Besides, most existing RS methods have been designed for specific tasks and are inadequate in generalizing to unseen recommendation tasks
  • , existing studies primarily rely on knowledge stored within the model’s weights, neglecting the potential benefits of leveraging external tools to access real-time information and domain-specific knowledge (Yang et al. 2023; Bao et al. 2023). Furthermore, the reasoning ability of LLMs for recommendation tasks is not fully utilized in current research, resulting in suboptimal predictions due to the intricate nature of recommendation-related tasks (Liu et al. 2023).
  • Key Components
    • Planning: Enables Agents to break complex recommendation tasks into manageable steps. Consider the setting where the goal is to generate the final result y given problem x via an LLM Agent parameterized by θ. The traditional input-output method gives the result by y ∼ pθ(y|x). With planning, RecMind generates the result y ∼ pθ(y|planing(x)), where planing(x) is a set of prompts that decomposes problem x into a series sub-tasks that is composed of thought h, action a, and observation o
    • Memory: Consisting of personalized memory and world knowledge. Personalized Memory includes individualized user information, such as their reviews or ratings for a particular item. World Knowledge consists of two components: the first component is item metadata information, which also falls under the domain-specific knowledge category; the second component involves real-time information that can be accessed through Web search tool.
    • Tools: Enhance agents' functionality on top of LLM.
  • Algo proposed in the paper: At each planning step the agent "self-inspires" to consider all previously explored planning paths to explore the next planning states.
    • Unlike existing Chain-of-Thoughts (CoT) (Wei et al. 2022) and Tree-of-Thoughts (ToT) (Yao et al. 2023) which discards states (thoughts) in previously explored paths when generating a new state, SI retains all previous states from all history paths when generating new state.

Literature survey

  • LLM as agents
    • The central concept is to leverage LLMs to produce text-based outputs and actions that can then be used for making API calls and performing operations within a specific environment.
    • , by enabling LLMs to utilize tools, we can enhance their capacity to tap into a much broader and dynamic knowledge space
    • A number of successful applications have emerged, including ReAct (Yao et al. 2022), Toolformer (Schick et al. 2023), HuggingGPT (Shen et al. 2023), generative agents (Park et al. 2023), WebGPT (Nakano et al. 2021), AutoGPT (Gravitas 2023), BabyAGI (Nakajima 2023), and Langchain (Chase 2023).
  • LLM as recommendations
    • Current LLM-based recommender systems are primarily designed for rating prediction (Kang et al. 2023; Bao et al. 2023) and sequential recommendation tasks (Wang and Lim 2023; Yang et al. 2023; Hou et al. 2023)
    • In both tasks, a user’s previous interactions with items, along with other optional data like the user profile or item attributes, are concatenated to formulate a natural language prompt. This is then fed into an LLM with options for no fine-tuning (Lim 2023), full-model fine-tuning (Yang et al. 2023), or parameter-efficient fine-tuning (Bao et al. 2023).
    • In the sequential recommendation task, to reduce the search space and better tailor it to each dataset, an optional pre-filtered set of item candidates is included in the input prompts. This ensures the model generates the final ranked list based on that specific set.

Architecture

  • Chain of thoughts(CoT)
    • CoT planning method follows a single path in the reasoning tree.
    • The next state is a generated based on the probabilistic function of current state until it reaches a final state.
  • Tree-of-Thoughts (ToT)
    • extends CoT to explore multiple paths in the reasoning tree.
    • The next state is a generated based on the probabilistic function of current state until it reaches a final state.
    • Eventually ToT-BFS generates a single path similar to CoT. In contrast, ToT-DFS explores one branch at a time, but might prune the current state, and backtracks to the previous state to start a new reasoning branch
  • Recmind Algo(Self Inspiring)
    • SI inspires itself into exploring an alternative reasoning branch, while retaining all previous states.
    • The next output is not only a probabilitic function of current state but all states so far.

Tools they used

1) DB tool
1) To enable the access of the RecMind to in-domain knowledge, we store all the review data in to a MySQL database, consisting of a table with the product meta information and a table with the interaction history of all the users.
2) Web search tool
3) Text summarization tool

Experimental Results on Precision-oriented Recommendation Tasks

1) Rating Prediction
1) Rating prediction is an essential task in recommendation systems that aims to predict the rating that a user would give to a particular item.
2) RecMind with different types of planning mechanisms usually outperforms the fully-trained models for rating prediction tasks. Such improvement mainly stems from the fact that RecMind has access to both the rating history of the user given to different items and the rating history of the item received from different users in the database
2) Direct Recommendation
1) the RecMind predicts the recommended items from a candidate set of 100 items from the same dataset, where only one candidate is positive
2) For a specific user {userID} with a list of products, the agent will be prompted, `“From the item candidates listed, choose the top 10 items to recommend to the user {userID} and rank them in order of priority from highest to lowest. Candidates: [‘Item List’]”.
3) Before evaluating each test data, we remove the interaction history between the positive item and the user to avoid data leakage
4) The results show that fully-trained models such as P5 usually perform better than RecMind
5) Specifically, the LLM agent tends to first retrieve information related to items positioned in front of the candidate list. Such positional bias has also been observed in previous works (Liu et al. 2023).
6) shows that diverse reasoning planning, such as tree-of-thoughts and our proposed self-inspiring, can alleviate this issue by gradually filtering out less-possible items. However, it is still hard for LLMs to fully explore the chances of all candidates, especially with limitations on prompt context length.
3) Sequential Recommendation
1) Sequential Recommendation. For sequential recommendation, the Agent takes the names of the user’s historically interacted items in order as input. Then the agent is prompted to predict the title of the next item that the user might interact with.
2) For a specific user {userID} with the interaction history in chronological order, the agent will be prompted, user {userID} has interacted with the following items in chronological order: [‘Item List’]. Please recommend the next item that the user might interact with. Choose the top 10 products to recommend in order of priority, from highest to lowest.”.
4) Explanability
1) In explanation generation, we assess the performance of RecMind in crafting textual explanations that justify a user’s interaction with a specific item.
2) The results indicate that RecMind, when leveraging self-inspiring techniques, can achieve performance comparable to the fully trained P5 model. TThis is aided by the in-domain knowledge retrieved from personalized memory, such as reviews from other users on the same item.

The main benefit of using LLM vs a recommender system is the ability to transfer the reasoning to another domain.

0 Comments

What do you think?