You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

160 lines
4.6 KiB

# Doc Search
[![PyPI - Python Version](](
[![PyPI - License](](
Converse with a book (PDF)
See [tweet]( for full demo.
**Documentation**: [](
**Source Code**: [](
**PyPI**: [](
## Pre-requisites
- [Tessaract OCR](
- [ImageMagick](
> **Note:**
> If you are using Windows, then make sure that you set the location
> of ImageMagick executable in the `IMCONV` environment variable.
# For example, if you have installed ImageMagick in PROGRAMFILES\ImageMagick-7.1.0-Q16-HDRI
set IMCONV="%PROGRAMFILES%\ImageMagick-7.1.0-Q16-HDRI\magick"
## Installation
pip install dr-doc-search
## Example Usage
There are two steps to use this application:
**1.** First, you need to create the index and generate embeddings for the PDF file.
Here I'm using a PDF file generated from this page [Parable of a Monetary Economy
Before running this, you need to set up your OpenAI API key. You can get it from [OpenAI](
> From version 1.5.0, you can skip OpenAI and use HuggingFace models to generate embeddings and answers.
export OPENAI_API_KEY=<your-openai-api-key>
The run the following command to start the training process:
dr-doc-search --train -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf
Use `huggingface` for generating embeddings:
dr-doc-search --train -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf --embedding huggingface
The training process generates some temporary files in the `OutputDir/dr-doc-search/<pdf-name>` folder under your home directory.
Here is what it looks like:
$ tree
├── images
│ ├── output-1.png
│ ├── output-10.png
│ ├── output-11.png
│ └── output-9.png
├── index
│ ├── docsearch.index
│ └── index.pkl
├── parable-of-a-monetary-economy-heteconomist.pdf
└── scanned
├── output-1.txt
└── output-9.txt
> **Note:**
> It is possible to change the base of the output directory by providing the `--app-dir` argument.
**2.** Now that we have the index, we can use it to start asking questions.
dr-doc-search -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf --input-question "How did the attempt to reduce the debut resulted in decrease in employment?"
Or You can open up a web interface (on port :5006) to ask questions:
dr-doc-search --web-app -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf
To use `huggingface` model, provide the `--llm` argument:
dr-doc-search --web-app -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf --llm huggingface
There are more options for choose the start and end pages for the PDF file.
See the help for more details:
dr-doc-search --help
## Acknowledgements
- [anton/@abacaj]( for the idea
- [LangChain](
- [HoloViz Panel](
- [OpenAI](
## Development
* Clone this repository
* Requirements:
* Python 3.7+
* [Poetry](
* Create a virtual environment and install the dependencies
poetry install
* Activate the virtual environment
poetry shell
### Validating build
make build
### Release process
A release is automatically published when a new version is bumped using `make bump`.
See `.github/workflows/build.yml` for more details.
Once the release is published, `.github/workflows/publish.yml` will automatically publish it to PyPI.
### Disclaimer
This project is not affiliated with OpenAI.
The OpenAI API and GPT-3 language model are not free after the trial period.