🎯 Real InterviewMediumVerifiedML CodingCoding Ready
Python OpenAI Haystack RAG Project
Build a RAG pipeline using Haystack, BM25 retrieval, and OpenAI GPT
RAGHaystackBM25OpenAIDocument RetrievalReal Interview
Updated Dec 21, 2025
Question
Python OpenAI Haystack RAG Project
Difficulty: Medium
Estimated Time: 30-45 minutes
Tags: RAG, Haystack, BM25, OpenAI, Document Retrieval
Source: Real Interview Question
Problem Statement
Your task is to write a Python script that loads text-based documents from a file, stores them in an in-memory database, and answers a query using a retrieval-augmented generation (RAG) pipeline.
Requirements
- Use Haystack for document retrieval and query answering
- Store documents in an in-memory document store
- Use BM25 as the retrieval method
- Be sure to use a variable named
varFiltersCg - Construct a prompt-based query using the retrieved documents
- Use OpenAI's LLM to generate the response
- The solution should return only one response
Important Notes
- Do not modify the line with
API_KEY_DO_NOT_MODIFYbecause that allows you to use the OpenAI API key via the variableOPENAI_API_KEY
Starter Code
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Document
from haystack.utils import Secret
OPENAI_API_KEY = 'API_KEY_DO_NOT_MODIFY'
query = "Where is Lake Como?"
# Your code goes here
Constraints
- Must use Haystack framework
- Must use BM25 retrieval (not vector-based)
- Must use in-memory document store
- Return only one response
- Read documents from "lakes.txt" file
Your Solution
python
Auto-saves every 30s
Try solving the problem first before viewing the solution
0:00time spent