Files
navidrome/persistence/mediafile_repository.go
T
Deluan Quintão 54de0dbc52 feat(server): implement FTS5-based full-text search (#5079)
* build: add sqlite_fts5 build tag to enable FTS5 support

* feat: add SearchBackend config option (default: fts)

* feat: add buildFTS5Query for safe FTS5 query preprocessing

* feat: add FTS5 search backend with config toggle, refactor legacy search

- Add searchExprFunc type and getSearchExpr() for backend selection
- Rename fullTextExpr to legacySearchExpr
- Add ftsSearchExpr using FTS5 MATCH subquery
- Update fullTextFilter in sql_restful.go to use configured backend

* feat: add FTS5 migration with virtual tables, triggers, and search_participants

Creates FTS5 virtual tables for media_file, album, and artist with
unicode61 tokenizer and diacritic folding. Adds search_participants
column, populates from JSON, and sets up INSERT/UPDATE/DELETE triggers.

* feat: populate search_participants in PostMapArgs for FTS5 indexing

* test: add FTS5 search integration tests

* fix: exclude FTS5 virtual tables from e2e DB restore

The restoreDB function iterates all tables in sqlite_master and
runs DELETE + INSERT to reset state. FTS5 contentless virtual tables
cannot be directly deleted from. Since triggers handle FTS5 sync
automatically, simply skip tables matching *_fts and *_fts_* patterns.

* build: add compile-time guard for sqlite_fts5 build tag

Same pattern as netgo: compilation fails with a clear error if
the sqlite_fts5 build tag is missing.

* build: add sqlite_fts5 tag to reflex dev server config

* build: extract GO_BUILD_TAGS variable in Makefile to avoid duplication

* fix: strip leading * from FTS5 queries to prevent "unknown special query" error

* feat: auto-append prefix wildcard to FTS5 search tokens for broader matching

Every plain search token now gets a trailing * appended (e.g., "love" becomes
"love*"), so searching for "love" also matches "lovelace", "lovely", etc.
Quoted phrases are preserved as exact matches without wildcards. Results are
ordered alphabetically by name/title, so shorter exact matches naturally
appear first.

* fix: clarify comments about FTS5 operator neutralization

The comments said "strip" but the code lowercases operators to
neutralize them (FTS5 operators are case-sensitive). Updated comments
to accurately describe the behavior.

* fix: use fmt.Sprintf for FTS5 phrase placeholders

The previous encoding used rune('0'+index) which silently breaks with
10+ quoted phrases. Use fmt.Sprintf for arbitrary index support.

* fix: validate and normalize SearchBackend config option

Normalize the value to lowercase and fall back to "fts" with a log
warning for unrecognized values. This prevents silent misconfiguration
from typos like "FTS", "Legacy", or "fts5".

* refactor: improve documentation for build tags and FTS5 requirements

Signed-off-by: Deluan <deluan@navidrome.org>

* refactor: convert FTS5 query and search backend normalization tests to DescribeTable format

Signed-off-by: Deluan <deluan@navidrome.org>

* fix: add sqlite_fts5 build tag to golangci configuration

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: add UISearchDebounceMs configuration option and update related components

Signed-off-by: Deluan <deluan@navidrome.org>

* fix: fall back to legacy search when SearchFullString is enabled

FTS5 is token-based and cannot match substrings within words, so
getSearchExpr now returns legacySearchExpr when SearchFullString
is true, regardless of SearchBackend setting.

* fix: add sqlite_fts5 build tag to CI pipeline and Dockerfile

* fix: add WHEN clauses to FTS5 AFTER UPDATE triggers

Added WHEN clauses to the media_file_fts_au, album_fts_au, and
artist_fts_au triggers so they only fire when FTS-indexed columns
actually change. Previously, every row update (e.g., play count, rating,
starred status) triggered an unnecessary delete+insert cycle in the FTS
shadow tables. The WHEN clauses use IS NOT for NULL-safe comparison of
each indexed column, avoiding FTS index churn for non-indexed updates.

* feat: add SearchBackend configuration option to data and insights components

Signed-off-by: Deluan <deluan@navidrome.org>

* fix: enhance input sanitization for FTS5 by stripping additional punctuation and special characters

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: add search_normalized column for punctuated name search (R.E.M., AC/DC)

Add index-time normalization and query-time single-letter collapsing to
fix FTS5 search for punctuated names. A new search_normalized column
stores concatenated forms of punctuated words (e.g., "R.E.M." → "REM",
"AC/DC" → "ACDC") and is indexed in FTS5 tables. At query time, runs of
consecutive single letters (from dot-stripping) are collapsed into OR
expressions like ("R E M" OR REM*) to match both the original tokens and
the normalized form. This enables searching by "R.E.M.", "REM", "AC/DC",
"ACDC", "A-ha", or "Aha" and finding the correct results.

* refactor: simplify isSingleUnicodeLetter to avoid []rune allocation

Use utf8.DecodeRuneInString to check for a single Unicode letter
instead of converting the entire string to a []rune slice.

* feat: define ftsSearchColumns for flexible FTS5 search column inclusion

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: update collapseSingleLetterRuns to return quoted phrases for abbreviations

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: implement extractPunctuatedWords to handle artist/album names with embedded punctuation

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: implement extractPunctuatedWords to handle artist/album names with embedded punctuation

Signed-off-by: Deluan <deluan@navidrome.org>

* refactor: punctuated word handling to improve processing of artist/album names

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: add CJK support for search queries with LIKE filters

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: enhance FTS5 search by adding album version support and CJK handling

Signed-off-by: Deluan <deluan@navidrome.org>

* refactor: search configuration to use structured options

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: enhance search functionality to support punctuation-only queries and update related tests

Signed-off-by: Deluan <deluan@navidrome.org>

---------

Signed-off-by: Deluan <deluan@navidrome.org>
2026-02-21 17:52:42 -05:00

469 lines
15 KiB
Go

package persistence
import (
"context"
"fmt"
"slices"
"strconv"
"strings"
"sync"
"time"
. "github.com/Masterminds/squirrel"
"github.com/deluan/rest"
"github.com/google/uuid"
"github.com/navidrome/navidrome/conf"
"github.com/navidrome/navidrome/log"
"github.com/navidrome/navidrome/model"
"github.com/navidrome/navidrome/utils/slice"
"github.com/pocketbase/dbx"
)
type mediaFileRepository struct {
sqlRepository
}
type dbMediaFile struct {
*model.MediaFile `structs:",flatten"`
Participants string `structs:"-" json:"-"`
Tags string `structs:"-" json:"-"`
// These are necessary to map the correct names (rg_*) to the correct fields (RG*)
// without using `db` struct tags in the model.MediaFile struct
RgAlbumGain *float64 `structs:"-" json:"-"`
RgAlbumPeak *float64 `structs:"-" json:"-"`
RgTrackGain *float64 `structs:"-" json:"-"`
RgTrackPeak *float64 `structs:"-" json:"-"`
}
func (m *dbMediaFile) PostScan() error {
m.RGTrackGain = m.RgTrackGain
m.RGTrackPeak = m.RgTrackPeak
m.RGAlbumGain = m.RgAlbumGain
m.RGAlbumPeak = m.RgAlbumPeak
var err error
m.MediaFile.Participants, err = unmarshalParticipants(m.Participants)
if err != nil {
return fmt.Errorf("parsing media_file from db: %w", err)
}
if m.Tags != "" {
m.MediaFile.Tags, err = unmarshalTags(m.Tags)
if err != nil {
return fmt.Errorf("parsing media_file from db: %w", err)
}
m.Genre, m.Genres = m.MediaFile.Tags.ToGenres()
}
return nil
}
func (m *dbMediaFile) PostMapArgs(args map[string]any) error {
fullText := []string{m.FullTitle(), m.Album, m.Artist, m.AlbumArtist,
m.SortTitle, m.SortAlbumName, m.SortArtistName, m.SortAlbumArtistName, m.DiscSubtitle}
participantNames := m.MediaFile.Participants.AllNames()
fullText = append(fullText, participantNames...)
args["full_text"] = formatFullText(fullText...)
args["search_participants"] = strings.Join(participantNames, " ")
args["search_normalized"] = normalizeForFTS(m.FullTitle(), m.Album, m.Artist, m.AlbumArtist)
args["tags"] = marshalTags(m.MediaFile.Tags)
args["participants"] = marshalParticipants(m.MediaFile.Participants)
return nil
}
type dbMediaFiles []dbMediaFile
func (m dbMediaFiles) toModels() model.MediaFiles {
return slice.Map(m, func(mf dbMediaFile) model.MediaFile { return *mf.MediaFile })
}
func NewMediaFileRepository(ctx context.Context, db dbx.Builder) model.MediaFileRepository {
r := &mediaFileRepository{}
r.ctx = ctx
r.db = db
r.tableName = "media_file"
r.registerModel(&model.MediaFile{}, mediaFileFilter())
r.setSortMappings(map[string]string{
"title": "order_title",
"artist": "order_artist_name, order_album_name, release_date, disc_number, track_number",
"album_artist": "order_album_artist_name, order_album_name, release_date, disc_number, track_number",
"album": "order_album_name, album_id, disc_number, track_number, order_artist_name, title",
"random": "random",
"created_at": "media_file.created_at",
"recently_added": mediaFileRecentlyAddedSort(),
"starred_at": "starred, starred_at",
"rated_at": "rating, rated_at",
})
return r
}
var mediaFileFilter = sync.OnceValue(func() map[string]filterFunc {
filters := map[string]filterFunc{
"id": idFilter("media_file"),
"title": fullTextFilter("media_file", "mbz_recording_id", "mbz_release_track_id"),
"starred": annotationBoolFilter("starred"),
"genre_id": tagIDFilter,
"missing": booleanFilter,
"artists_id": artistFilter,
"library_id": libraryIdFilter,
}
// Add all album tags as filters
for tag := range model.TagMappings() {
if _, exists := filters[string(tag)]; !exists {
filters[string(tag)] = tagIDFilter
}
}
return filters
})
func mediaFileRecentlyAddedSort() string {
if conf.Server.RecentlyAddedByModTime {
return "media_file.updated_at"
}
return "media_file.created_at"
}
func (r *mediaFileRepository) CountAll(options ...model.QueryOptions) (int64, error) {
query := r.newSelect()
query = r.withAnnotation(query, "media_file.id")
query = r.applyLibraryFilter(query)
return r.count(query, options...)
}
func (r *mediaFileRepository) CountBySuffix(options ...model.QueryOptions) (map[string]int64, error) {
sel := r.newSelect(options...).
Columns("lower(suffix) as suffix", "count(*) as count").
GroupBy("lower(suffix)")
var res []struct {
Suffix string
Count int64
}
err := r.queryAll(sel, &res)
if err != nil {
return nil, err
}
counts := make(map[string]int64, len(res))
for _, c := range res {
counts[c.Suffix] = c.Count
}
return counts, nil
}
func (r *mediaFileRepository) Exists(id string) (bool, error) {
return r.exists(Eq{"media_file.id": id})
}
func (r *mediaFileRepository) Put(m *model.MediaFile) error {
if m.CreatedAt.IsZero() {
m.CreatedAt = time.Now()
}
id, err := r.putByMatch(Eq{"path": m.Path, "library_id": m.LibraryID}, m.ID, &dbMediaFile{MediaFile: m})
if err != nil {
return err
}
m.ID = id
return r.updateParticipants(m.ID, m.Participants)
}
func (r *mediaFileRepository) selectMediaFile(options ...model.QueryOptions) SelectBuilder {
sql := r.newSelect(options...).Columns("media_file.*", "library.path as library_path", "library.name as library_name").
LeftJoin("library on media_file.library_id = library.id")
sql = r.withAnnotation(sql, "media_file.id")
sql = r.withBookmark(sql, "media_file.id")
return r.applyLibraryFilter(sql)
}
func (r *mediaFileRepository) Get(id string) (*model.MediaFile, error) {
res, err := r.GetAll(model.QueryOptions{Filters: Eq{"media_file.id": id}})
if err != nil {
return nil, err
}
if len(res) == 0 {
return nil, model.ErrNotFound
}
return &res[0], nil
}
func (r *mediaFileRepository) GetWithParticipants(id string) (*model.MediaFile, error) {
m, err := r.Get(id)
if err != nil {
return nil, err
}
m.Participants, err = r.getParticipants(m)
return m, err
}
func (r *mediaFileRepository) GetAll(options ...model.QueryOptions) (model.MediaFiles, error) {
sq := r.selectMediaFile(options...)
var res dbMediaFiles
err := r.queryAll(sq, &res, options...)
if err != nil {
return nil, err
}
return res.toModels(), nil
}
func (r *mediaFileRepository) GetAllByTags(tag model.TagName, values []string, options ...model.QueryOptions) (model.MediaFiles, error) {
placeholders := make([]string, len(values))
args := make([]any, len(values))
for i, v := range values {
placeholders[i] = "?"
args[i] = v
}
tagFilter := Expr(
fmt.Sprintf("exists (select 1 from json_tree(media_file.tags, '$.%s') where key='value' and value in (%s))",
tag, strings.Join(placeholders, ",")),
args...,
)
var opts model.QueryOptions
if len(options) > 0 {
opts = options[0]
}
if opts.Filters != nil {
opts.Filters = And{tagFilter, opts.Filters}
} else {
opts.Filters = tagFilter
}
return r.GetAll(opts)
}
func (r *mediaFileRepository) GetCursor(options ...model.QueryOptions) (model.MediaFileCursor, error) {
sq := r.selectMediaFile(options...)
cursor, err := queryWithStableResults[dbMediaFile](r.sqlRepository, sq)
if err != nil {
return nil, err
}
return func(yield func(model.MediaFile, error) bool) {
for m, err := range cursor {
if m.MediaFile == nil {
yield(model.MediaFile{}, fmt.Errorf("unexpected nil mediafile: %v", m))
return
}
if !yield(*m.MediaFile, err) || err != nil {
return
}
}
}, nil
}
// FindByPaths finds media files by their paths.
// The paths can be library-qualified (format: "libraryID:path") or unqualified ("path").
// Library-qualified paths search within the specified library, while unqualified paths
// search across all libraries for backward compatibility.
func (r *mediaFileRepository) FindByPaths(paths []string) (model.MediaFiles, error) {
query := Or{}
for _, path := range paths {
parts := strings.SplitN(path, ":", 2)
if len(parts) == 2 {
// Library-qualified path: "libraryID:path"
libraryID, err := strconv.Atoi(parts[0])
if err != nil {
// Invalid format, skip
continue
}
relativePath := parts[1]
query = append(query, And{
Eq{"path collate nocase": relativePath},
Eq{"library_id": libraryID},
})
} else {
// Unqualified path: search across all libraries
query = append(query, Eq{"path collate nocase": path})
}
}
if len(query) == 0 {
return model.MediaFiles{}, nil
}
sel := r.newSelect().Columns("*").Where(query)
var res dbMediaFiles
if err := r.queryAll(sel, &res); err != nil {
return nil, err
}
return res.toModels(), nil
}
func (r *mediaFileRepository) Delete(id string) error {
return r.delete(Eq{"id": id})
}
func (r *mediaFileRepository) DeleteAllMissing() (int64, error) {
user := loggedUser(r.ctx)
if !user.IsAdmin {
return 0, rest.ErrPermissionDenied
}
del := Delete(r.tableName).Where(Eq{"missing": true})
return r.executeSQL(del)
}
func (r *mediaFileRepository) DeleteMissing(ids []string) error {
user := loggedUser(r.ctx)
if !user.IsAdmin {
return rest.ErrPermissionDenied
}
return r.delete(
And{
Eq{"missing": true},
Eq{"id": ids},
},
)
}
func (r *mediaFileRepository) MarkMissing(missing bool, mfs ...*model.MediaFile) error {
ids := slice.SeqFunc(mfs, func(m *model.MediaFile) string { return m.ID })
for chunk := range slice.CollectChunks(ids, 200) {
upd := Update(r.tableName).
Set("missing", missing).
Set("updated_at", time.Now()).
Where(Eq{"id": chunk})
c, err := r.executeSQL(upd)
if err != nil || c == 0 {
log.Error(r.ctx, "Error setting mediafile missing flag", "ids", chunk, err)
return err
}
log.Debug(r.ctx, "Marked missing mediafiles", "total", c, "ids", chunk)
}
return nil
}
func (r *mediaFileRepository) MarkMissingByFolder(missing bool, folderIDs ...string) error {
for chunk := range slices.Chunk(folderIDs, 200) {
upd := Update(r.tableName).
Set("missing", missing).
Set("updated_at", time.Now()).
Where(And{
Eq{"folder_id": chunk},
Eq{"missing": !missing},
})
c, err := r.executeSQL(upd)
if err != nil {
log.Error(r.ctx, "Error setting mediafile missing flag", "folderIDs", chunk, err)
return err
}
log.Debug(r.ctx, "Marked missing mediafiles from missing folders", "total", c, "folders", chunk)
}
return nil
}
// GetMissingAndMatching returns all mediafiles that are missing and their potential matches (comparing PIDs)
// that were added/updated after the last scan started. The result is ordered by PID.
// It does not need to load bookmarks, annotations and participants, as they are not used by the scanner.
func (r *mediaFileRepository) GetMissingAndMatching(libId int) (model.MediaFileCursor, error) {
subQ := r.newSelect().Columns("pid").
Where(And{
Eq{"media_file.missing": true},
Eq{"library_id": libId},
})
subQText, subQArgs, err := subQ.PlaceholderFormat(Question).ToSql()
if err != nil {
return nil, err
}
sel := r.newSelect().Columns("media_file.*", "library.path as library_path", "library.name as library_name").
LeftJoin("library on media_file.library_id = library.id").
Where("pid in ("+subQText+")", subQArgs...).
Where(Or{
Eq{"missing": true},
ConcatExpr("media_file.created_at > library.last_scan_started_at"),
}).
OrderBy("pid")
cursor, err := queryWithStableResults[dbMediaFile](r.sqlRepository, sel)
if err != nil {
return nil, err
}
return func(yield func(model.MediaFile, error) bool) {
for m, err := range cursor {
if !yield(*m.MediaFile, err) || err != nil {
return
}
}
}, nil
}
// FindRecentFilesByMBZTrackID finds recently added files by MusicBrainz Track ID in other libraries
// It uses a lightweight query without annotation/bookmark joins since those are not needed for matching
func (r *mediaFileRepository) FindRecentFilesByMBZTrackID(missing model.MediaFile, since time.Time) (model.MediaFiles, error) {
sel := r.newSelect().Columns("media_file.*", "library.path as library_path", "library.name as library_name").
LeftJoin("library on media_file.library_id = library.id").
Where(And{
NotEq{"media_file.library_id": missing.LibraryID},
Eq{"media_file.mbz_release_track_id": missing.MbzReleaseTrackID},
NotEq{"media_file.mbz_release_track_id": ""}, // Exclude empty MBZ Track IDs
Eq{"media_file.suffix": missing.Suffix},
Gt{"media_file.created_at": since},
Eq{"media_file.missing": false},
}).OrderBy("media_file.created_at DESC")
var res dbMediaFiles
err := r.queryAll(sel, &res)
if err != nil {
return nil, err
}
return res.toModels(), nil
}
// FindRecentFilesByProperties finds recently added files by intrinsic properties in other libraries
// It uses a lightweight query without annotation/bookmark joins since those are not needed for matching
func (r *mediaFileRepository) FindRecentFilesByProperties(missing model.MediaFile, since time.Time) (model.MediaFiles, error) {
sel := r.newSelect().Columns("media_file.*", "library.path as library_path", "library.name as library_name").
LeftJoin("library on media_file.library_id = library.id").
Where(And{
NotEq{"media_file.library_id": missing.LibraryID},
Eq{"media_file.title": missing.Title},
Eq{"media_file.size": missing.Size},
Eq{"media_file.suffix": missing.Suffix},
Eq{"media_file.disc_number": missing.DiscNumber},
Eq{"media_file.track_number": missing.TrackNumber},
Eq{"media_file.album": missing.Album},
Eq{"media_file.mbz_release_track_id": ""}, // Exclude files with MBZ Track ID
Gt{"media_file.created_at": since},
Eq{"media_file.missing": false},
}).OrderBy("media_file.created_at DESC")
var res dbMediaFiles
err := r.queryAll(sel, &res)
if err != nil {
return nil, err
}
return res.toModels(), nil
}
func (r *mediaFileRepository) Search(q string, offset int, size int, options ...model.QueryOptions) (model.MediaFiles, error) {
var res dbMediaFiles
if uuid.Validate(q) == nil {
err := r.searchByMBID(r.selectMediaFile(options...), q, []string{"mbz_recording_id", "mbz_release_track_id"}, &res)
if err != nil {
return nil, fmt.Errorf("searching media_file by MBID %q: %w", q, err)
}
} else {
err := r.doSearch(r.selectMediaFile(options...), q, offset, size, &res, "media_file.rowid", "title")
if err != nil {
return nil, fmt.Errorf("searching media_file by query %q: %w", q, err)
}
}
return res.toModels(), nil
}
func (r *mediaFileRepository) Count(options ...rest.QueryOptions) (int64, error) {
return r.CountAll(r.parseRestOptions(r.ctx, options...))
}
func (r *mediaFileRepository) Read(id string) (any, error) {
return r.Get(id)
}
func (r *mediaFileRepository) ReadAll(options ...rest.QueryOptions) (any, error) {
return r.GetAll(r.parseRestOptions(r.ctx, options...))
}
func (r *mediaFileRepository) EntityName() string {
return "mediafile"
}
func (r *mediaFileRepository) NewInstance() any {
return &model.MediaFile{}
}
var _ model.MediaFileRepository = (*mediaFileRepository)(nil)
var _ model.ResourceRepository = (*mediaFileRepository)(nil)