Text search in PostgreSQL: from LIKE to embeddings

Text search is not a single problem. PostgreSQL enables an evolution from simple matching to semantic search by combining multiple techniques.

Hybrid text search in PostgreSQL

Most systems start solving search with a simple LIKE. It is fast, easy to implement, and works for very basic use cases.

The problem appears as systems grow. Users make typos, use different terms, or simply do not remember the exact text.

At that point, search stops being trivial and becomes a core component of the system.

The key is understanding that there is no single solution, but a natural evolution of techniques, each solving a different problem.

LIKE: the starting point

LIKE or ILIKE allows matching exact or partial text.

It works in controlled scenarios but has clear limitations:

  • No understanding of language
  • No handling of variations or synonyms
  • No tolerance to typos

It is a valid starting point, but insufficient for real systems.

Full Text Search: understanding language

PostgreSQL includes a full text search system that works at a linguistic level.

Instead of comparing full strings, text is broken into words, irrelevant terms are removed, and word roots are considered.

This enables:

  • Relevance-based search
  • Ignoring common words
  • Handling word variations

Combined with normalization techniques such as accent removal, it becomes much more robust.

For many use cases, this layer is already enough.

Trigrams: handling typos

When users make mistakes or matches are not exact, textual similarity becomes important.

Trigrams compare text based on fragments, detecting similarity even with errors.

This is especially useful for:

  • Names
  • Addresses
  • Autocomplete

It does not understand meaning, but it understands similarity.

Embeddings: understanding meaning

When the challenge is not how text is written but what it means, a different approach is needed.

Embeddings represent text as numerical vectors that capture meaning.

This makes it possible to:

  • Search by intent
  • Find related content without shared words
  • Work flexibly with natural language

At this stage, search becomes semantic.

The common mistake: choosing one approach

A common mistake is trying to solve all search problems with a single technique.

Some systems stay at LIKE or full text search and fail in real scenarios. Others jump directly to embeddings, adding unnecessary complexity and cost.

Both approaches are incomplete.

A hybrid approach

The most effective systems combine multiple techniques, each providing a different signal.

  • Full text search provides linguistic relevance
  • Trigrams provide textual similarity
  • Embeddings provide semantic understanding

Instead of choosing one, they are combined.

The result is a system capable of:

  • Understanding user input
  • Tolerating errors
  • Interpreting intent

The importance of balance

Not all searches require the same approach.

Depending on the context, different techniques should have more weight:

  • Document search relies more on language
  • Names or addresses rely more on similarity
  • Open queries rely more on semantics

The goal is not to apply more technology, but the right one.

PostgreSQL as a search engine

There is a common assumption that advanced search requires external systems from the start.

However, PostgreSQL already provides enough capabilities to build advanced search systems:

  • Linguistic search
  • Text similarity
  • Vector support

This reduces complexity, keeps systems consistent, and avoids unnecessary dependencies.

Conclusion

Text search is not a single problem but a combination of different ones.

Understanding this evolution allows building more accurate, robust, and efficient systems.

PostgreSQL is not just a database. Properly used, it can be the core of an advanced search system.

Volver al blog