🎯 Real InterviewMediumVerifiedML CodingCoding Ready

Python OpenAI Haystack RAG Project

Build a RAG pipeline using Haystack, BM25 retrieval, and OpenAI GPT

RAGHaystackBM25OpenAIDocument RetrievalReal Interview
Updated Dec 21, 2025

Question

Python OpenAI Haystack RAG Project

Difficulty: Medium
Estimated Time: 30-45 minutes
Tags: RAG, Haystack, BM25, OpenAI, Document Retrieval
Source: Real Interview Question


Problem Statement

Your task is to write a Python script that loads text-based documents from a file, stores them in an in-memory database, and answers a query using a retrieval-augmented generation (RAG) pipeline.


Requirements

  • Use Haystack for document retrieval and query answering
  • Store documents in an in-memory document store
  • Use BM25 as the retrieval method
  • Be sure to use a variable named varFiltersCg
  • Construct a prompt-based query using the retrieved documents
  • Use OpenAI's LLM to generate the response
  • The solution should return only one response

Important Notes

  • Do not modify the line with API_KEY_DO_NOT_MODIFY because that allows you to use the OpenAI API key via the variable OPENAI_API_KEY

Starter Code

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
      
from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Document
from haystack.utils import Secret

OPENAI_API_KEY = 'API_KEY_DO_NOT_MODIFY'

query = "Where is Lake Como?"

# Your code goes here

Constraints

  • Must use Haystack framework
  • Must use BM25 retrieval (not vector-based)
  • Must use in-memory document store
  • Return only one response
  • Read documents from "lakes.txt" file

Your Solution

python
Auto-saves every 30s

Try solving the problem first before viewing the solution

0:00time spent