Using the Pinecone Vector Database for Semantic Search in Laravel

When you want to find search results that are relevant to the meaning of the query/question, rather than the exact words/terms used, you need to use semantic search.

Semantic search requires the "meaning" to be somehow numerically represented in the data that you are searching. This is accomplished using embeddings, which are vector representations of tokens (words, characters, etc).

The embeddings data needs to be stored somewhere to perform semantic search. There are quite a few options out there by now, but generally what you're looking for is a vector database. It allows storage, indexing, and fast lookups of embeddings.

Here, I'm going to show you how to use Pinecone in Laravel for semantic search. Pinecone is a popular vector database that is easy to use and has a free tier.

First, let's install a Composer package that makes it easy to use Pinecone in a Laravel app.

composer require probots-io/pinecone-php

Next, let's register the probots Pinecone client in our service container so that we can instantiate it with config values automatically populated.

namespace App\Providers;

use \Probots\Pinecone\Client as Pinecone;
use Illuminate\Support\ServiceProvider;

class AppServiceProvider extends ServiceProvider
{
    public function register(): void
    {
        $this->app->bind(Pinecone::class, fn () => new Pinecone(
            config('services.pinecone.key'),
            config('services.pinecone.environment')
        ));
    }
}

Stick these values in your services.php config file, getting them from your Pinecone dashboard.

// services.php

return [
    // ...
    'pinecone' => [
        'api_key' => env('PINECONE_API_KEY', ''),
        'environment' => env('PINECONE_ENVIRONMENT', ''),
];

For illustration purposes, let's make a simple Semantic Search client that can save and search embeddings. The index name is another value that you can get from your Pinecone dashboard.

namespace App;

use Probots\Pinecone\Client as Pinecone;
use Ramsey\Uuid\Uuid;

class SemanticSearchClient
{
    public function __construct(
        private readonly Pinecone $pinecone,
    ) {
    }

    /**
     * @param float[] $documentEmbedding
     */
    public function save(array $documentEmbedding)
    {
        $pinecone->index('your-index-name')->vectors()->upsert(
            vectors: [
                [
                    'id' => Uuid::uuid4()->toString(),
                    'values' => $documentEmbedding,
                ]
            ]
        );
    }

    /**
     * @param float[] $queryEmbedding
     */
     */
    public function search(array $queryEmbedding)
    {
        $pinecone->index('your-index-name')->vectors()->query(
            query: [
                'topK' => 10, // top 10 results, most similar/relevant first
                'query' => [
                    'values' => $queryEmbedding->embedding,
                ]
            ]
        );
    }
}

And that's it! Now you can use this client to save and search embeddings.

$semanticSearch = app(SemanticSearchClient::class);

// Just an example; you'll need to generate these embeddings from your data.
$documentEmbedding = [0.1, 0.2, 0.3, 0.4, 0.5];

$semanticSearch->save($documentEmbedding);

// later

// Just an example.
$queryEmbedding = [0.01, 0.3, 0.2, 0.4, 0.5];

$results = $semanticSearch->search($queryEmbedding);

dd($results);