documentation gpt langchain

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

Go to file

namuan 1a54940e2a bump: version 1.2.0 → 1.3.0		1 year ago
.github	ci: Run tests w/ Python 3.10	1 year ago
assets	fix: Update Changelog and remove # used for GH issues	1 year ago
docs	fix: Update Changelog and remove # used for GH issues	1 year ago
src/doc_search	feat: #5 Add support for huggingface embedding and allow user to select a different embedding provider from OpenAI	1 year ago
tests	feat: Remove duplicate processing of finding similarities	1 year ago
.flake8	feat: Initial commit with a skeleton project	1 year ago
.gitignore	build: Ignore file	1 year ago
.pre-commit-config.yaml	docs: Update README	1 year ago
CHANGELOG.md	bump: version 1.2.0 → 1.3.0	1 year ago
LICENCE	feat: Initial commit with a skeleton project	1 year ago
Makefile	docs: Update README to remove dependency which is not necessary for running the application	1 year ago
README.md	docs: Add acknowledgements and disclaimer	1 year ago
mkdocs.yml	fix: Update Changelog and remove # used for GH issues	1 year ago
poetry.lock	feat: #5 Add support for huggingface embedding and allow user to select a different embedding provider from OpenAI	1 year ago
poetry.toml	feat: Initial commit with a skeleton project	1 year ago
pyproject.toml	bump: version 1.2.0 → 1.3.0	1 year ago
setup.cfg	feat: Initial commit with a skeleton project	1 year ago

README.md

Doc Search

Converse with a book (PDF)

See tweet for full demo.

Documentation: https://namuan.github.io/dr-doc-search

Source Code: https://github.com/namuan/dr-doc-search

PyPI: https://pypi.org/project/dr-doc-search/

Pre-requisites

Installation

pip install dr-doc-search

Example Usage

There are two steps to use this application:

1. First, you need to create the index and generate embeddings for the PDF file. Here I'm using a PDF file generated from this page Parable of a Monetary Economy

Before running this, you need to set up your OpenAI API key. You can get it from OpenAI.

export OPENAI_API_KEY=<your-openai-api-key>

The run the following command to start the training process:

dr-doc-search --train -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf

The training process generates some temporary files in the OutputDir/dr-doc-search/<pdf-name> folder under your home directory. Here is what it looks like:

 ~/OutputDir/dr-doc-search/parable-of-a-monetary-economy-heteconomist
$ tree
.
├── images
│ ├── output-1.png
│ ├── output-10.png
│ ├── output-11.png
...
│ └── output-9.png
├── index
│ ├── docsearch.index
│ └── index.pkl
├── parable-of-a-monetary-economy-heteconomist.pdf
└── scanned
    ├── output-1.txt
    ...
    └── output-9.txt

Note: It is possible to change the base of the output directory by providing the --app-dir argument.

2. Now that we have the index, we can use it to start asking questions.

dr-doc-search -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf --input-question "How did the attempt to reduce the debut resulted in decrease in employment?"

Or You can open up a web interface (on port :5006) to ask questions:

dr-doc-search --web-app -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf

There are more options for choose the start and end pages for the PDF file. See the help for more details:

dr-doc-search --help

Acknowledgements

Development

Clone this repository
Requirements:
- Python 3.7+
- Poetry
Create a virtual environment and install the dependencies

poetry install

Activate the virtual environment

poetry shell

Validating build

make build

Release process

A release is automatically published when a new version is bumped using make bump. See .github/workflows/build.yml for more details. Once the release is published, .github/workflows/publish.yml will automatically publish it to PyPI.

Disclaimer

This project is not affiliated with OpenAI. The OpenAI API and GPT-3 language model are not free after the trial period.