The previously developed Paper-Agent has recently been refactored and upgraded, with the main changes as follows:
Support for Multiple Large Models
Previously, only deepseek and kimi were integrated, with Kimi being the primary large model for PDF Q&A. However, a significant issue was identified: Kimi’s API is quite expensive. Conducting a complete ten-question Q&A for a paper might cost over a dollar. The official Kimi solution offers a Context Caching feature to save token usage, but I haven’t explored this yet. If we intend to rely more on Kimi in the future, I might implement this feature when time permits.
To reduce costs, deepseek is currently the most affordable large model. Thus, PDF text extraction needs to be handled locally. I use Pymupdf for text extraction, and the results are satisfactory.
Local code was reorganized. Previously, the code for deepseek and kimi had significant overlap. This time, a base class for LLM was defined, with different implementation classes for different scenarios:
class LLM(ABC):
def __init__(self, model_name: str):
conf = llm_config.get(model_name)
self.model_name = conf['model_name']
self.api_key = conf['api_key']
self.base_url = conf['base_url']
@abstractmethod
def chat(self, message: str, system_prompt: str = "") -> str:
pass
@abstractmethod
def chat_pdf(self, message: str, file_content) -> str:
pass
Both deepseek and kimi can directly use OpenAI’s SDK, so they were unified under an OpenAI type LLM: OpenAiLlm. Another LLM implementation is the local Ollama class: OllamaLlm. This allows those who can run large models locally to avoid API costs.
Since Kimi has file management capabilities, a separate implementation class was created. Specifically, the model to use is determined by instantiating the appropriate model:
current_llm = KimiLlm() if use_kimi else OllamaLlm(‘qwen’) if use_ollama else OpenAiLlm(‘deepseek’)
The model configuration can be specified in the llm_config in model.py:
llm_config = {
"deepseek": {
"model_name": "deepseek-chat",
"api_key": os.environ.get('DEEPSEEK_KEY'),
"base_url": "https://api.deepseek.com"
},
"kimi": {
"model_name": "moonshot-v1-128k",
"api_key": os.environ.get('KIMI_KEY'),
"base_url": "https://api.moonshot.cn/v1"
},
"qwen": {
"model_name": "qwen2",
'api_key': 'not needed',
"base_url": "address of ollama"
}
}
Here, deepseek and kimi use APIs, while qwen uses Ollama.
Other Changes
In addition to the model changes, two extra buttons were added to the page: Generate All and Export MD. These buttons allow you to generate answers to all questions with one click (though it may take some time) and export the page content to Markdown.
The configuration for system prompts and the ten questions for papers can be modified in prompt_template.py.
Additionally, if you only need to export Markdown based on the paper, you can directly use the newly added flow.py. Modify the paper URL in it and run it to generate a Markdown document containing the paper title, abstract, and Q&A, which is very suitable for quickly understanding the paper content.
Project Update Status
The updates for this project may be paused for now. Using Streamlit has too many limitations, and currently, it functions more as an arxiv-helper than a paper-agent, as it can only use papers from arxiv. Although the code includes the aminer interface, it has not been used. In the future, we might start a new project to create a web-based paper management platform.
Appendix: Using Colab to Deploy Ollama Locally
My machine isn’t powerful enough to run most models, but I often use Colab for learning and development. It is more than sufficient for deploying Ollama. To use it locally, you need a service called Ngrok (https://ngrok.com/). Many of you might have heard of it. We need to connect the Ollama deployed on Colab with Ngrok.
After creating a Ngrok account, you can find your token in the Your Authtoken menu on the Dashboard. Configure this token in the Colab notebook by clicking the key-shaped button on the left side of Colab. After configuring, you can get the token as follows:
from google.colab import userdata
token = userdata.get('NGROK_TOKEN')
Next, start Ollama on Colab and connect it with Ngrok. The code is self-explanatory. I found an introduction on Juejin and copied some of the code with simplifications. I am sharing the notebook link here. You can execute it with one click.
The only modification you might need is to select a model you need to use. For example, here we use qwen2:
!ollama pull qwen2
The notebook will output the Ngrok proxy address, which can be directly configured to the base_url of Ollama in the code above. Then you can enjoy using the model deployed by Ollama.