Using pgvector embeddings search in Laravel

When you want to find search results that are relevant to the meaning of the query/question, rather than the exact words/terms used, you need to use semantic search.

Semantic search requires the "meaning" to be somehow numerically represented in the data that you are searching. This is accomplished using embeddings, which are vector representations of tokens (words, characters, etc).

The embeddings data needs to be stored somewhere to perform semantic search. There are quite a few options out there by now, but generally what you're looking for is a vector database. It allows storage, indexing, and fast lookups of embeddings.

What if you didn't want to use a hosted, managed provider to stored your embeddings but somehow keep them in your own database? Well, you're in luck!

Here, I'm going to show you how to use pgvector for semantic search in Laravel Eloquent queries. pgvector is an open-source vector similarity search for Postgres.

First, you'll need a Postgres installation. You can install this with Homebrew, on Linux with apt, or with Docker. I'll use Docker below, but the installation type doesn't really matter here.

docker pull ankane/pgvector

Next, let's run a Postgres container with pgvector installed.

docker volume create pgvector-data

docker run --name pgvector -v pgvectordata:/var/lib/postgresql/data -e POSTGRES_PASSWORD=password -e POSTGRES_USER=forge --publish 5432:5432 -it ankane/pgvector

This creates a volume to persist the data in the container, and runs the container with the Postgres port exposed on 5432. You can also stop/start the container by name ("pgvector"), rather than ID.

Next, in your Laravel app, add the pgvector package and publish its extension-enabling migration.

composer require ankane/pgvector

php artisan vendor:publish --tag="pgvector-migrations"

And set your database driver to pgsql in your .env file.

# .env

DB_CONNECTION=pgsql
DB_HOST=127.0.0.1
DB_PORT=5432
DB_DATABASE=forge
DB_USERNAME=forge
DB_PASSWORD=password

Let's treat our embeddings as first-class models. So let's create an Eloquent model and migration.

php artisan make:model --migration --factory Embedding

In the migration, let's create a vector column to store the raw embeddings, as well as a JSON column for metadata to return along with the results. You can put anything you want into that one.

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;
use Illuminate\Support\Facades\DB;

return new class extends Migration
{
    public function up(): void
    {
        Schema::create('embeddings', function (Blueprint $table) {
            $table->id();
            $table->vector('embedding', 1536); // Dimensionality; 1536 for OpenAI's ada-002
            $table->json('metadata');
            $table->timestamps();
        });

        // This is a Postgres-specific index that allows us to do fast nearest-neighbor searches
        // when there are a lot of high-dimensional embeddings in the database.
        DB::statement('CREATE INDEX my_index ON embeddings USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)');
    }

    public function down(): void
    {
        Schema::dropIfExists('embeddings');
    }
};

In the model, let's add the relevant casts.

namespace App\Models;

use Illuminate\Database\Eloquent\Factories\HasFactory;
use Illuminate\Database\Eloquent\Model;
use Pgvector\Laravel\Vector;

class Embedding extends Model
{
    use HasFactory;

    protected $guarded = [];

    protected $casts = [
        'embedding' => Vector::class,
        'metadata' => 'array',
    ];
}

Time to run our migrations. This will enable the pgvector extension in Postgres (migration from the package) and create our embeddings table.

php artisan migrate

To demonstrate embeddings + semantic search in a more real-world-like scenario, we'll use OpenAI's ada-002 embedding model. Add the OpenAI Laravel package if you haven't already.

composer require openai-php/laravel

php artisan vendor:publish --provider="OpenAI\Laravel\ServiceProvider"

And add your OpenAI API key to your .env file.

# .env

OPENAI_API_KEY="sk-..."

# https://platform.openai.com/account/org-settings
OPENAI_ORGANIZATION="org-..."

Now that the setup is out of the way, let's create a simple console command to populate our database with embeddings.

// In routes/console.php

use Illuminate\Support\Facades\Artisan;
use OpenAI\Laravel\Facades\OpenAI;
use App\Models\Embedding;

Artisan::command('insert', function() {

    $sayings = [
        'Felines say meow',
        'Canines say woof',
        'Birds say tweet',
        'Humans say hello',
    ];

    $result = OpenAI::embeddings()->create([
        'model' => 'text-embedding-ada-002',
        'input' => $sayings
    ]);

    foreach ($sayings as $key=>$saying) {
        Embedding::query()->create([
            'embedding' => $result->embeddings[$key]->embedding,
            'metadata' => [
                'saying' => $saying,
            ]
        ]);
    }
});

Run that once:

php artisan insert

You can check that they've been inserted:

docker exec -it \
    pgvector \
    psql -U forge -d forge -c "select count(*) from embeddings"

And now let's run a query! Back in your console routes file:

// In routes/console.php

use Pgvector\Laravel\Vector;

Artisan::command('search', function() {
    $result = OpenAI::embeddings()->create([
        'model' => 'text-embedding-ada-002',
        'input' => 'What do dogs say?',
    ]);

    $embedding = new Vector($result->embeddings[0]->embedding);

    $this->table(
        ['saying'],
        Embedding::query()
            ->orderByRaw('embedding <-> ?', [$embedding])
            ->take(2)
            ->pluck('metadata')
    );
});

php artisan search

+------------------+
| saying           |
+------------------+
| Canines say woof |
| Felines say meow |
+------------------+

And that's it! You now have embeddings-based semantic search working in a Postgres database with Laravel without an external vector database.