Retrieval-Augmented Generation (RAG) with LangChain4j and the Oracle Database 23ai

Juarez Junior
8 min read2 days ago

--

Oracle Database 23ai

by Juarez Junior

Introduction

As you may have heard, Oracle released Oracle Database 23ai, an accessible, free, yet feature-complete offering of the industry-leading Oracle Database.

Oracle Database 23ai now includes semantic search capabilities using AI vectors. The collection of features, called Oracle AI Vector Search, includes a new vector data type, vector indexes, and vector search SQL operators that enable the Oracle Database to store the semantic content of documents, images, and other unstructured data as vectors, and use these to build Generative AI applications with fast similarity queries.

These new capabilities also support Retrieval Augmented Generation (RAG), a breakthrough generative AI technique that combines Large Language Models (LLMs) and business data to deliver more informed answers to natural language questions.

Oracle Database 23ai

Please have a look at this blog post to further understand why Oracle AI Vector Search is a powerful feature — What Is Vector Search? The Ultimate Guide.

It’s beyond the scope of this blog post to teach you about all things LangChain4J. Nevertheless, LangChain4j is a library that simplifies integrating LLMs into Java applications. It allows you to create AI Services and implement several GenAI use cases, such as RAG, Tools, Chains, Chat Completion, Multimodal, and others.

Last but not least, I assume OpenAI needs no introduction at all. Pretty much everyone has heard about it, and its most popular creation — ChatGPT. I will show you how you can use LangChain4J with an embedding and chat model to interact with GenAI models provided by OpenAI.

This blog post provides a quick guide to building an RAG application with LangChain4J, the Oracle Database 23ai, its JDBC Driver, and OpenAI. So, without further ado, let’s get started!

Prerequisites

Oracle Database 23ai

If you check the download link, you will see that it is available as a Linux x64 RPM installation file (OL8 or RHEL8). So, if you’re a Java developer using Windows or macOS, you may wonder how you can kick the tires on it.

The good news is that a container image is also available on Oracle Container Registry (OCIR), which you can use to install it and test its amazing features. Let’s try it quickly and easily and test it on Windows. To do so, first, run the docker pull command below:

docker pull container-registry.oracle.com/database/free:latest

Now, you can run the docker images command below to confirm that it was pulled properly:

Oracle Database 23ai — container image

Then, run the command below from the Windows Command Prompt (cmd):

docker run -d -p 1521:1521 -e ORACLE_PWD=<your_password> -v oracle-volume:/opt/oracle/oradata container-registry.oracle.com/database/free:latest

Replace <your password> above with your chosen password as required. The command above will run a new persistent database container (data is kept throughout container lifecycles).

If everything goes well, you can run the command docker ps -al to confirm that your container instance is starting as expected.

Container status — health: starting

Please note that the Oracle Database 23ai will be ready to use when the STATUS field shows (healthy), as below.

Container status — healthy

There are some additional options if you want to configure environment variables to support your connection details, such as username, password, and other variables. You can also check more details about the specific database user (SYS, SYSTEM, PDBADMIN) you may want to use.

If you want to explore such options, please check the official page for the related container image on the Oracle Container Registry (OCIR).

Now your database is installed! Let’s proceed to access it from both SQL Developer and Java!

Connect to Oracle Database 23ai Free

The next step is to connect to Oracle Database 23ai to run the required DDL script. Please follow the instructions in this blog post, which covers how to do so with the IntelliJ IDEA tool.

Alternatively, you can also use Oracle SQL Developer, a free tool that simplifies the development and management of Oracle Databases in traditional and cloud deployments, or any other tool you might prefer.

Connect to your DB 23ai Free instance and run the DDL script

Now, you must connect to the DB 23ai Free database instance and execute the DDL script — langchain4j-oracle.sql — to create the tables for our sample LangChain4J application.

Open the Oracle SQL Developer tool as explained previously, copy the DDL script from the Gist below — and execute it as required by clicking the Run Statement button.

Note that the OracleDBUtils.java class has the Oracle Database connection details. Adjust the values accordingly, according to your environment.

OracleDBUtils.java class — JDBC connection details

Make sure the username and password follow Oracle’s requirements for DB passwords. Otherwise, an error will happen during the script execution process. Also, set them as environment variables as shown above.

The LangChain4J application

It’s beyond the scope of this blog post to teach you about all things LangChain4J.

Nevertheless, LangChain4j is a library that simplifies integrating LLMs into Java applications. It allows you to create AI Services and implement several GenAI use cases, such as RAG, Tools, Chains, Chat Completion, Multimodal, and others.

There are plenty of tutorials on this subject, including the official LangChain4J GitHub repository.

Our main goal is to show you how to properly implement a GenAI RAG application with LangChain4J, OpenAI, Oracle AI Vector Search, and the Oracle Database 23ai.

We’ll provide an overview of the code sample's main components so you can understand what to expect it to do.

LangChain4JOpenAiBasicChat.java

The first scenario is a simple test application, that will allow you to perform a simple chat interaction with OpenAI. Basically, we’ll ask a question about the blog author and get an imprecise answer as OpenAI does not know about him, so we’ll have to augment the context of this interaction with the RAG use case and the use of a PDF file with more information about the blog author.

LangChain4JOpenAiBasicChat.java

It gets a wrong answer then.

The second scenario covers the main application components are based on LangChain4J, specifically an implementation of a required OracleEmbeddingStore Java interface and the Oracle AI Vector Search features of Oracle Database 23ai.

Code sample — application components

OracleDb23aiLangChain4JOpenAiRag.java

This class is the implementation at the core of our Retrieval-Augmented Generation (RAG) use case. Note that it is the main entry point (main method) to execute the final application.

Besides, it uses a PDF file, Juarez Barbosa Junio.pdf, under /src/main/resources to support a comparison between a search about the author without RAG and another one with some information from his online profile.

It supports the implementation of RAG by performing four basic tasks as below:

  • Gets the PDF file to be ingested and stored as vectors
  • Extracts the content from the PDF file
  • Initializes the OpenAI embedding model to support the embedding step
  • Initializes the Vector embedding store per the requirements
OracleDb23aiLangChain4JOpenAiRag.java

OracleEmbeddingStoreIngestor.java

This class will create an Ingestor component that creates the embedding using an EmbeddingModel and the related EmbeddingStore implementation (Oracle) to create the embeddings.

OracleEmbeddingStoreIngestor.java

OracleEmbeddingStore.java

A class that provides an implementation of dev.langchain4j.store.embedding.EmbeddingStore for the Oracle Database 23ai, along with information to support the Oracle AI Vector search mechanisms and the Oracle Database vector store and associated vector embeddings.

OracleEmbeddingStore.java

Utility Classes

The remaining classes in our code samples are only helper, utility classes that support the general RAG implementation.

Run the LangChain4J application

Change to the root directory of your project, and run the Maven commands below as usual. An example is below:

cd <YOUR_WORKSPACE_ROOT_DIRECTORY>\langchain4j-oracle
mvn clean package

Select the main class, and execute it with your preferred IDE. We’ll do it here in debugging mode just for learning purposes.

First, the model name and dimensions are logged.

Then, the PDF content is logged.

Then, you will see that we initialize the Oracle embedding store ingestor component.

Next, it will ingest the PDF file.

Then, the embedding-related steps.

To confirm that, you can run an SQL query below to see the actual vector embeddings as expected.

Finally, a vector similarity search will be performed.

Last but not least, a more meaningful answer is provided as a response.

Wrapping it up

That’s it. You can now proceed to create your applications with LangChain4J, OpenAI, JDBC, and the Oracle Database 23ai!

We hope you liked this blog post. I will soon post more examples with other GenAI libraries and frameworks. Stay tuned!

References

Oracle Database 23ai

LangChain4J

Oracle Database Free Release 23ai — Container Image

Oracle® Database JDBC Java API Reference, Release 23ai

Oracle JDBC Driver 23ai (23.4.0.24.05) — Maven Central

Developers Guide For Oracle JDBC on Maven Central

Develop Java applications with Oracle Database

Oracle Developers and Oracle OCI Free Tier

Join our Oracle Developers channel on Slack to discuss Java, JDK, JDBC, GraalVM, Microservices with Spring Boot, Helidon, Quarkus, Micronaut, Reactive Streams, Cloud, DevOps, IaC, and other topics!

Build, test, and deploy your applications on Oracle Cloud — for free! Get access to OCI Cloud Free Tier!

--

--