Spaces:

anakin87
/

who-killed-laura-palmer

Sleeping

App Files Files Community

Stefano Fiorucci commited on May 24, 2022

Commit

a8158b1

•

1 Parent(s): 261cff9

big improvement in documentation

Browse files

Files changed (6) hide show

README.md +11 -1
app_utils/README.md +8 -0
data/README.md +10 -0
data/readme_images/spaces_logo.png +0 -0
data/readme_images/webapp.png +0 -0
notebooks/README.md +30 -0

README.md CHANGED Viewed

@@ -11,11 +11,19 @@ license: Apache-2.0
 ---
 # Who killed Laura Palmer? &nbsp; [![Generic badge](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer) [![Generic badge](https://img.shields.io/github/stars/anakin87/who-killed-laura-palmer?label=Github&style=social)](https://github.com/anakin87/who-killed-laura-palmer)
 ## 🗻🗻 Twin Peaks Question Answering system
 WKLP is a simple Question Answering system, based on data crawled from [Twin Peaks Wiki](https://twinpeaks.fandom.com/wiki/Twin_Peaks_Wiki). It is built using [🔍 Haystack](https://github.com/deepset-ai/haystack), an awesome open-source framework for building search systems that work intelligently over large document collections.
 ---
 ## Project architecture 🧱
@@ -35,6 +43,8 @@ WKLP is a simple Question Answering system, based on data crawled from [Twin Pea
 - How to build a nice [Streamlit](https://github.com/streamlit/streamlit) web app to show your QA system
 - How to optimize the web app to 🚀 deploy in [🤗 Spaces](https://huggingface.co/spaces)
 ## Repository structure 📁
 - [app.py](./app.py): Streamlit web app
 - [app_utils folder](./app_utils/): python modules used in the web app
@@ -46,7 +56,7 @@ Within each folder, you can find more in-depth explanations.
 ## Possible improvements ✨
 - The reader model (`deepset/roberta-base-squad2`) is a good compromise between speed and accuracy, running on CPU. There are certainly better (and more computationally expensive) models, as you can read in the [Haystack documentation](https://haystack.deepset.ai/pipeline_nodes/reader).
-- You can also think about preparing a Twin Peaks QA dataset and fine-tune the reader model to get better accuracy, as explained in [Haystack tutorial](https://haystack.deepset.ai/tutorials/fine-tuning-a-model).
 - ...

 ---
 # Who killed Laura Palmer? &nbsp; [![Generic badge](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer) [![Generic badge](https://img.shields.io/github/stars/anakin87/who-killed-laura-palmer?label=Github&style=social)](https://github.com/anakin87/who-killed-laura-palmer)
+[<img src="./data/readme_images/spaces_logo.png" style="display: block;margin-left: auto;
+  margin-right: auto;  max-width: 70%;}">](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer)
 ## 🗻🗻 Twin Peaks Question Answering system
 WKLP is a simple Question Answering system, based on data crawled from [Twin Peaks Wiki](https://twinpeaks.fandom.com/wiki/Twin_Peaks_Wiki). It is built using [🔍 Haystack](https://github.com/deepset-ai/haystack), an awesome open-source framework for building search systems that work intelligently over large document collections.
+  - [Project architecture 🧱](#project-architecture-)
+  - [What can I learn from this project? 📚](#what-can-i-learn-from-this-project-)
+  - [Repository structure 📁](#repository-structure-)
+  - [Possible improvements ✨](#possible-improvements-)
 ---
 ## Project architecture 🧱
 - How to build a nice [Streamlit](https://github.com/streamlit/streamlit) web app to show your QA system
 - How to optimize the web app to 🚀 deploy in [🤗 Spaces](https://huggingface.co/spaces)
+![Web app preview](./data/readme_images/webapp.png)
 ## Repository structure 📁
 - [app.py](./app.py): Streamlit web app
 - [app_utils folder](./app_utils/): python modules used in the web app
 ## Possible improvements ✨
 - The reader model (`deepset/roberta-base-squad2`) is a good compromise between speed and accuracy, running on CPU. There are certainly better (and more computationally expensive) models, as you can read in the [Haystack documentation](https://haystack.deepset.ai/pipeline_nodes/reader).
+- You can also think about preparing a Twin Peaks QA dataset and fine-tune the reader model to get better accuracy, as explained in this [Haystack tutorial](https://haystack.deepset.ai/tutorials/fine-tuning-a-model).
 - ...

app_utils/README.md ADDED Viewed

	@@ -0,0 +1,8 @@

+# App utils 🧰
+Python modules used in the [web app](../app.py).
+- [backend_utils.py](./backend_utils.py): backend functions to load the pipeline, answer a question and load random questions; *appropriate Streamlit caching*.
+- [frontend_utils.py](./frontend_utils.py): functions to manage the Streamlit web app appearance.
+- ⚙️ [config.py](./config.py): configurations, including score thresholds to accept answers and Hugging Face model names

data/README.md ADDED Viewed

	@@ -0,0 +1,10 @@

+# Data 📒📄📄
+All necessary data.
+- [input_docs](./input_docs/): JSON documents downloaded from [Twin Peaks wiki](https://twinpeaks.fandom.com/wiki/Twin_Peaks_Wiki) by the [crawler](../crawler/). Input for our Question Answering system.
+- [questions](./questions/): automatically generated questions (in [Question generation notebook](../notebooks/question_generation.ipynb)) and manually selected questions (used in the web app).
+- [index](./index/): files related to FAISS index created in [Indexing and pipeline creation notebook](../notebooks/indexing_and_pipeline_creation.ipynb). The index is used in the web app.
+- [readme_images](./readme_images/): images used in documentation.

data/readme_images/spaces_logo.png ADDED Viewed

data/readme_images/webapp.png ADDED Viewed

notebooks/README.md ADDED Viewed

	@@ -0,0 +1,30 @@

+# 📓 Notebooks
+Jupyter/Colab notebooks to create the Search pipeline and generate questions, using [ 🔍 Haystack](https://github.com/deepset-ai/haystack).
+## [Indexing and pipeline creation](./indexing_and_pipeline_creation.ipynb)
+This notebook is inspired by ["Build Your First QA System" tutorial](https://haystack.deepset.ai/tutorials/first-qa-system), from Haystack documentation.
+Here we use a collection of articles about Twin Peaks to answer a variety of questions about that awesome TV series!
+The following steps are performed:
+- load and preprocess data
+- create (FAISS) document store and write documents
+- initialize retriever and generate document embeddings
+- initialize reader
+- compose and try Question Answering pipeline
+- save and export (FAISS) index
+## [Question generation](./question_generation.ipynb)
+This notebook is inspired by [Question Generation tutorial](https://haystack.deepset.ai/tutorials/question-generation), from Haystack documentation.
+Here we use a collection of articles about Twin Peaks to generate a variety of questions about that awesome TV series!
+The following steps are performed:
+- load data
+- create document store and write documents
+- generate questions and save them