docs(k8s): document that Kubernetes support needs a non-default storage backend

Closes: #1602 Signed-off-by: Xe Iaso <me@xeiaso.net>
perf(internal/gzip): pool *gzip.Writer per middleware instance (#1654 )
2026-06-09 22:08:15 +00:00 · 2026-06-01 10:29:23 -04:00 · 2026-05-30 00:52:37 -04:00 · 2026-05-30 00:48:43 -04:00
18 changed files with 453 additions and 22 deletions
@@ -41,3 +41,6 @@ setuplistener
 mba
 xfu
 xou
+AWOO
+firewalls
+bindhosts
@@ -28,10 +28,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Enable [HTTP basic auth](./admin/policies.mdx#http-basic-authentication) for the metrics server.
 - Fix a bug in the dataset poisoning maze that could allow denial of service [#1580](https://github.com/TecharoHQ/anubis/issues/1580).
 - Add config option to add ASN to logs/metrics.
- Log weight when issuing challenge
+- Log weight when issuing challenge.
+- Gate pprof endpoints behind `metrics.debug` in the policy file.
+- Limit naive honeypot r9k delay to one second.
+- Fix an obscure case where adding query values to a subrequest match could cause an invalid rule match when using path based matching for protected resources.
+- Fix an edge case where load average expression values could nil pointer dereference when Anubis just started up.
+- Fix an obscure case where Anubis in subrequest mode could allow redirects to invalid domains with strange instructions.
 - Fix `path_regex` and CEL `path` rules not matching when using Traefik `forwardAuth` middleware. Anubis now checks `X-Forwarded-Uri` (Traefik) in addition to `X-Original-URI` (nginx) when resolving the request path in subrequest mode ([#1628](https://github.com/TecharoHQ/anubis/issues/1628)).
+- Validate bounds in the CEL `randInt` helper so non-positive or platform-overflowing arguments surface a typed CEL error instead of an evaluator panic.
+- Fix a race in the bbolt store where the asynchronous cleanup scheduled by an expired read could delete a value that had just been refreshed; the delete now only fires when the key still carries the same expired generation it observed.
 - Marginally increase the performances of requests processing
 - Marginally improve the performances of PoW validation
+- Significantly improve the performances of the gzip middleware

 ## v1.25.0: Necron

@@ -131,11 +131,27 @@ Then point your Ingress to the Anubis port:
              name: anubis
 ```

+## Storage
+
+By default, Anubis stores all of its data in memory. This memory is not shared between pods. If you have multiple instances of Anubis without the data being [stored outside of memory](../policies.mdx#storage-backends) and a [shared cookie key](../installation.mdx#key-generation), you will run into [unexpected behaviour](https://github.com/TecharoHQ/anubis/issues/1602) when user traffic traverses between pods.
+
+Based on the deployment of your Kubernetes cluster, here are the preferable storage backends to pick from:
+
+| Backend  | Pro                                                             | Con                                                                                          |
+| :------- | :-------------------------------------------------------------- | :------------------------------------------------------------------------------------------- |
+| `bbolt`  | Only requires a ReadWriteOnce PVC.                              | Does not support more than one Anubis pod.                                                   |
+| `memory` | Requires no configuration.                                      | Process memory is not shared between pods.                                                   |
+| `s3api`  | Great if your cluster includes Rook/Ceph to use RADOS directly. | Potentially higher latency unless you use a store like [Tigris](https://www.tigrisdata.com). |
+| `valkey` | Trivial to configure in your cluster.                           | If your Redis/Valkey server is down, Anubis is going to have issues.                         |
+
+Pick your poison accordingly. Many production deployments use the `s3api` and `valkey` backends without issue. Single node deployments can get away with either `memory` or `bbolt` depending on the facts and circumstances of the deployment.
+
 ## Envoy Gateway

 If you are using envoy-gateway, the `X-Real-Ip` header is not set by default, but Anubis does require it. You can resolve this by adding the header, either on the specific `HTTPRoute` where Anubis is listening, or on the `ClientTrafficPolicy` to apply it to any number of Gateways:

 HTTPRoute:
+
 ```yaml
 apiVersion: gateway.networking.k8s.io/v1
 kind: HTTPRoute
@@ -160,6 +176,7 @@ spec:
 ```

 Applying to any number of Gateways:
+
 ```yaml
 apiVersion: gateway.envoyproxy.io/v1alpha1
 kind: ClientTrafficPolicy
@@ -138,6 +138,24 @@ metrics:
  socketMode: "0700" # must be a string
 ```

+### Debug routes
+
+Anubis' metrics server supports [pprof](https://pkg.go.dev/runtime/pprof), the Go standard library tool for profiling Go applications. This is very useful for debugging how Anubis works in the wild with regards to CPU, multicore, and RAM usage. pprof is very powerful and can expose command line arguments as part of the debugging setup (inside Google, everything is done with command line flags).
+
+Prior versions of Anubis exposed pprof endpoints on all TCP bindhosts by default. This means that machines with incorrectly configured firewalls can expose command line arguments to the public internet in the right conditions.
+
+In order to enable pprof profiling endpoints on the Metrics server, set the `debug` flag under the `metrics` block:
+
+```yaml
+metrics:
+  bind: ":9090"
+  network: "tcp"
+
+  debug: true
+```
+
+To err on the side of caution, this defaults to disabled. If this defaults migration breaks your configuration, please let us know in a ticket.
+
 ### TLS

 If you want to serve the metrics server over TLS, use the `tls` block:
@@ -201,8 +219,11 @@ Anubis offers the following storage backends:

 - [`memory`](#memory) -- A simple in-memory hashmap
 - [`bbolt`](#bbolt) -- An on-disk key/value store backed by [bbolt](https://github.com/etcd-io/bbolt), an embedded key/value database for Go programs
+- [`s3api`](#s3api) -- Amazon S3 based storage or another compatible object store
 - [`valkey`](#valkey) -- A remote in-memory key/value database backed by [Valkey](https://valkey.io/) (or another database compatible with the [RESP](https://redis.io/docs/latest/develop/reference/protocol-spec/) protocol)

+:::warning
+
 If no storage backend is set in the policy file, Anubis will use the [`memory`](#memory) backend by default. This is equivalent to the following in the policy file:

 ```yaml
@@ -211,6 +232,10 @@ store:
  parameters: {}
 ```

+This means that all session data that is required for the challenge mechanism to work is stored **IN PROCESS MEMORY** that is **NOT** shared between instances of Anubis. If you set up Anubis with multiple instances using the `memory` storage backend, your users will sometimes get "Administrator has misconfigured Anubis" error messages when it cannot look up the aforementioned session data.
+
+:::
+
 ### `memory`

 The memory backend is an in-memory cache. This backend works best if you don't use multiple instances of Anubis or don't have mutable storage in the environment you're running Anubis in.
@@ -2,11 +2,28 @@ package internal

 import (
 	"compress/gzip"
+	"io"
 	"net/http"
 	"strings"
+	"sync"
 )

 func GzipMiddleware(level int, next http.Handler) http.Handler {
+	// Validate the level once at setup; gzip.NewWriterLevel only fails for
+	// invalid levels and we'd rather panic now than mid-request.
+	if _, err := gzip.NewWriterLevel(io.Discard, level); err != nil {
+		panic(err)
+	}
+
+	// Per-middleware pool of *gzip.Writer. Each entry carries ~40 KiB of
+	// deflate buffers; reusing them avoids that allocation on every request.
+	pool := sync.Pool{
+		New: func() any {
+			gz, _ := gzip.NewWriterLevel(io.Discard, level)
+			return gz
+		},
+	}
+
 	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		if !strings.Contains(r.Header.Get("Accept-Encoding"), "gzip") {
 			next.ServeHTTP(w, r)
@@ -14,11 +31,13 @@ func GzipMiddleware(level int, next http.Handler) http.Handler {
 		}

 		w.Header().Set("Content-Encoding", "gzip")
-		gz, err := gzip.NewWriterLevel(w, level)
-		if err != nil {
-			panic(err)
-		}
-		defer gz.Close()
+		gz := pool.Get().(*gzip.Writer)
+		gz.Reset(w)
+		defer func() {
+			gz.Close()
+			gz.Reset(io.Discard)
+			pool.Put(gz)
+		}()

 		grw := gzipResponseWriter{ResponseWriter: w, sink: gz}
 		next.ServeHTTP(grw, r)
@@ -169,7 +169,7 @@ func (i *Impl) ServeHTTP(w http.ResponseWriter, r *http.Request) {
 		}
 	}

-	millisecondAmount := math.Pow(float64(networkCount), 2)
+	millisecondAmount := min(math.Pow(float64(networkCount), 2), 1000)
 	time.Sleep(time.Duration(millisecondAmount) * time.Millisecond)

 	spins := i.makeSpins()
@@ -32,6 +32,7 @@ type Metrics struct {
 	Network    string            `json:"network" yaml:"network"`
 	SocketMode string            `json:"socketMode" yaml:"socketMode"`
 	TLS        *MetricsTLS       `json:"tls" yaml:"tls"`
+	Debug      bool              `json:"debug" yaml:"debug"`
 	BasicAuth  *MetricsBasicAuth `json:"basicAuth" yaml:"basicAuth"`
 }

@@ -403,14 +403,15 @@ func (s *Server) ServeHTTPNext(w http.ResponseWriter, r *http.Request) {
 		localizer := localization.GetLocalizer(r)

 		redir := r.FormValue("redir")
-		urlParsed, err := url.ParseRequestURI(redir)
+		urlParsed, err := url.Parse(redir)
 		if err != nil {
-			// if ParseRequestURI fails, try as relative URL
-			urlParsed, err = r.URL.Parse(redir)
-			if err != nil {
-				s.respondWithStatus(w, r, localizer.T("redirect_not_parseable"), makeCode(err), http.StatusBadRequest)
-				return
-			}
+			s.respondWithStatus(w, r, localizer.T("redirect_not_parseable"), makeCode(err), http.StatusBadRequest)
+			return
+		}
+
+		if urlParsed.Opaque != "" || (urlParsed.Scheme == "" && strings.HasPrefix(redir, "//")) {
+			s.respondWithStatus(w, r, localizer.T("invalid_redirect"), "", http.StatusBadRequest)
+			return
 		}

 		// validate URL scheme to prevent javascript:, data:, file:, tel:, etc.
@@ -223,3 +223,17 @@ func TestNoCacheOnError(t *testing.T) {
 		})
 	}
 }
+
+func TestRejectsHostlessRedirect(t *testing.T) {
+	pol := loadPolicies(t, "testdata/useragent.yaml", 0)
+	srv := spawnAnubis(t, Options{Policy: pol, RedirectDomains: []string{"allowed.example"}})
+	req := httptest.NewRequest(http.MethodGet, "https://anubis.example/.within.website/?redir=%2f%2fevil.example%2fphish", nil)
+	rr := httptest.NewRecorder()
+	srv.ServeHTTPNext(rr, req)
+	if rr.Code != http.StatusBadRequest {
+		t.Fatalf("expected hostless redirect to be rejected, got HTTP %d body %q", rr.Code, rr.Body.String())
+	}
+	if got := rr.Header().Get("Location"); got != "" {
+		t.Fatalf("expected no Location header on rejected redirect, got %q", got)
+	}
+}
@@ -34,11 +34,15 @@ func (s *Server) Run(ctx context.Context, done func()) {

 func (s *Server) run(ctx context.Context, lg *slog.Logger) error {
 	mux := http.NewServeMux()
-	mux.HandleFunc("GET /debug/pprof/", pprof.Index)
-	mux.HandleFunc("GET /debug/pprof/cmdline", pprof.Cmdline)
-	mux.HandleFunc("GET /debug/pprof/profile", pprof.Profile)
-	mux.HandleFunc("GET /debug/pprof/symbol", pprof.Symbol)
-	mux.HandleFunc("GET /debug/pprof/trace", pprof.Trace)
+
+	if s.Config.Debug {
+		mux.HandleFunc("GET /debug/pprof/", pprof.Index)
+		mux.HandleFunc("GET /debug/pprof/cmdline", pprof.Cmdline)
+		mux.HandleFunc("GET /debug/pprof/profile", pprof.Profile)
+		mux.HandleFunc("GET /debug/pprof/symbol", pprof.Symbol)
+		mux.HandleFunc("GET /debug/pprof/trace", pprof.Trace)
+	}
+
 	mux.Handle("/metrics", promhttp.Handler())
 	mux.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
 		st, ok := internal.GetHealth("anubis")
@@ -0,0 +1,49 @@
+package metrics
+
+import (
+	"context"
+	"io"
+	"log/slog"
+	"net"
+	"net/http"
+	"strings"
+	"testing"
+	"time"
+
+	"github.com/TecharoHQ/anubis/lib/config"
+)
+
+func TestMetricsPprofCmdlineExposedWithoutAuthentication(t *testing.T) {
+	ln, err := net.Listen("tcp", "127.0.0.1:0")
+	if err != nil {
+		t.Fatal(err)
+	}
+	addr := ln.Addr().String()
+	_ = ln.Close()
+
+	ctx, cancel := context.WithCancel(context.Background())
+	defer cancel()
+	done := make(chan struct{})
+	srv := &Server{
+		Config: &config.Metrics{Network: "tcp", Bind: addr},
+		Log:    slog.Default(),
+	}
+	go srv.Run(ctx, func() { close(done) })
+
+	url := "http://" + addr + "/debug/pprof/cmdline"
+	var body []byte
+	resp, err := http.Get(url)
+	if err == nil {
+		body, err = io.ReadAll(resp.Body)
+		if err != nil {
+			t.Fatalf("can't read body: %v", err)
+		}
+		defer resp.Body.Close()
+	}
+	time.Sleep(50 * time.Millisecond)
+	if strings.Contains(string(body), "metrics.test") {
+		t.Fatalf("pprof is enabled by default, cmdline process arguments: %q", string(body))
+	}
+	cancel()
+	<-done
+}
@@ -5,6 +5,7 @@ import (
 	"fmt"
 	"net/http"
 	"net/netip"
+	"net/url"
 	"regexp"
 	"strings"

@@ -114,6 +115,9 @@ func (pc *PathChecker) Check(r *http.Request) (bool, error) {
 			originalUrl = r.Header.Get("X-Forwarded-Uri")
 		}
 		if originalUrl != "" {
+			if parsed, err := url.ParseRequestURI(originalUrl); err == nil {
+				originalUrl = parsed.Path
+			}
 			if pc.regexp.MatchString(originalUrl) {
 				return true, nil
 			}
@@ -222,7 +222,16 @@ func New(opts ...cel.EnvOption) (*cel.Env, error) {
 						return types.ValOrErr(val, "value is not an integer, but is %T", val)
 					}

-					return types.Int(rand.IntN(int(n)))
+					if n <= 0 {
+						return types.NewErr("randInt bound must be positive, got %d", int64(n))
+					}
+
+					bound := int(n)
+					if types.Int(bound) != n {
+						return types.NewErr("randInt bound %d overflows platform int", int64(n))
+					}
+
+					return types.Int(rand.IntN(bound))
 				}),
 			),
 		),
@@ -9,6 +9,7 @@ import (

 	"github.com/TecharoHQ/anubis/internal/dns"
 	"github.com/TecharoHQ/anubis/lib/store/memory"
+	"github.com/google/cel-go/cel"
 	"github.com/google/cel-go/common/types"
 	"github.com/google/cel-go/common/types/ref"
 )
@@ -688,6 +689,14 @@ func TestNewEnvironment(t *testing.T) {
 			description:   "should return values in correct range",
 			shouldCompile: true,
 		},
+		{
+			name:          "randInt-large-bound",
+			expression:    `randInt(2147483647) >= 0`,
+			variables:     map[string]any{},
+			expectBool:    boolPtr(true),
+			description:   "should accept int32-max bounds without overflow",
+			shouldCompile: true,
+		},
 		{
 			name:          "strings-extension-size",
 			expression:    `"hello".size() == 5`,
@@ -750,3 +759,65 @@ func TestNewEnvironment(t *testing.T) {
 func boolPtr(b bool) *bool {
 	return &b
 }
+
+func TestRandIntInvalidBounds(t *testing.T) {
+	env, err := New(cel.Variable("contentLength", cel.IntType))
+	if err != nil {
+		t.Fatalf("failed to create environment: %v", err)
+	}
+
+	tests := []struct {
+		name        string
+		expression  string
+		variables   map[string]any
+		wantErrText string
+		description string
+	}{
+		{
+			name:        "zero-bound-literal",
+			expression:  `randInt(0)`,
+			variables:   map[string]any{},
+			wantErrText: "randInt bound must be positive",
+			description: "randInt(0) should return a CEL error, not panic",
+		},
+		{
+			name:        "negative-bound-literal",
+			expression:  `randInt(-5)`,
+			variables:   map[string]any{},
+			wantErrText: "randInt bound must be positive",
+			description: "randInt(-5) should return a CEL error, not panic",
+		},
+		{
+			name:        "zero-bound-from-variable",
+			expression:  `randInt(contentLength)`,
+			variables:   map[string]any{"contentLength": 0},
+			wantErrText: "randInt bound must be positive",
+			description: "attacker-controlled zero contentLength should error gracefully",
+		},
+		{
+			name:        "negative-bound-from-variable",
+			expression:  `randInt(contentLength)`,
+			variables:   map[string]any{"contentLength": -1},
+			wantErrText: "randInt bound must be positive",
+			description: "attacker-controlled negative contentLength should error gracefully",
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			prog, err := Compile(env, tt.expression)
+			if err != nil {
+				t.Fatalf("failed to compile expression %q: %v", tt.expression, err)
+			}
+
+			result, _, err := prog.Eval(tt.variables)
+			if err == nil {
+				t.Fatalf("%s: expected an evaluation error, got result %v", tt.description, result)
+			}
+
+			if !strings.Contains(err.Error(), tt.wantErrText) {
+				t.Errorf("%s: expected error containing %q, got %q", tt.description, tt.wantErrText, err.Error())
+			}
+		})
+	}
+}
@@ -46,7 +46,7 @@ var (
 )

 func init() {
-	globalLoadAvg = &loadAvg{}
+	globalLoadAvg = &loadAvg{data: &load.AvgStat{}}
 	go globalLoadAvg.updateThread(context.Background())
 }

@@ -1,6 +1,8 @@
 package policy

 import (
+	"net/http"
+	"net/http/httptest"
 	"os"
 	"path/filepath"
 	"testing"
@@ -85,3 +87,27 @@ func TestBadConfigs(t *testing.T) {
 		})
 	}
 }
+
+func TestPathCheckerStripsForwardedURIQuery(t *testing.T) {
+	checker, err := NewPathChecker("^/admin$", true)
+	if err != nil {
+		t.Fatal(err)
+	}
+	req := httptest.NewRequest(http.MethodGet, "https://anubis.local/.within.website/x/cmd/anubis/api/check", nil)
+	req.Header.Set("X-Forwarded-Uri", "/admin?x=1")
+	matched, err := checker.Check(req)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if !matched {
+		t.Fatalf("expected exact path checker to match forwarded URI when query string is appended")
+	}
+	req.Header.Set("X-Forwarded-Uri", "/admin")
+	matched, err = checker.Check(req)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if !matched {
+		t.Fatalf("expected exact path checker to match forwarded URI without query string")
+	}
+}
@@ -50,6 +50,33 @@ func (s *Store) Delete(ctx context.Context, key string) error {
 	})
 }

+// deleteIfExpired removes key only if it still carries the exact expiry that an
+// expired Get observed and that expiry is still in the past.
+//
+// Get runs in a read-only transaction, so it can only schedule cleanup
+// asynchronously. Between observing the expiry and this delete running, another
+// request may Set a fresh value for the same key. Re-reading and matching the
+// observed expiry inside the write transaction makes the timestamp act as a
+// generation token: a refreshed value carries a different, future expiry and is
+// therefore left untouched (see AWOO-015).
+func (s *Store) deleteIfExpired(ctx context.Context, key string, observed time.Time) error {
+	return s.bdb.Update(func(tx *bbolt.Tx) error {
+		valueBkt := tx.Bucket([]byte(key))
+		if valueBkt == nil {
+			return nil
+		}
+
+		expiry, err := time.Parse(time.RFC3339Nano, string(valueBkt.Get([]byte("expiry"))))
+		if err != nil || !expiry.Equal(observed) || !time.Now().After(expiry) {
+			// Unparseable, refreshed to a different generation, or no longer
+			// expired: leave it for cleanup or a later Get to handle.
+			return nil
+		}
+
+		return tx.DeleteBucket([]byte(key))
+	})
+}
+
 // Get a value from the datastore.
 //
 // Because each value is stored in its own bucket with data and expiry keys,
@@ -77,7 +104,7 @@ func (s *Store) Get(ctx context.Context, key string) ([]byte, error) {
 		}

 		if time.Now().After(expiry) {
-			go s.Delete(context.Background(), key)
+			go s.deleteIfExpired(context.Background(), key, expiry)
 			return fmt.Errorf("%w: %q", store.ErrNotFound, key)
 		}

@@ -4,8 +4,10 @@ import (
 	"encoding/json"
 	"path/filepath"
 	"testing"
+	"time"

 	"github.com/TecharoHQ/anubis/lib/store/storetest"
+	"go.etcd.io/bbolt"
 )

 func TestImpl(t *testing.T) {
@@ -20,3 +22,154 @@ func TestImpl(t *testing.T) {

 	storetest.Common(t, Factory{}, json.RawMessage(data))
 }
+
+// newTestStore returns a Store backed by a throwaway bbolt database that is
+// closed when the test finishes.
+func newTestStore(t *testing.T) *Store {
+	t.Helper()
+
+	db, err := bbolt.Open(filepath.Join(t.TempDir(), "db"), 0600, nil)
+	if err != nil {
+		t.Fatalf("can't open bbolt database: %v", err)
+	}
+	t.Cleanup(func() { db.Close() })
+
+	return &Store{bdb: db}
+}
+
+// mustSet writes a value with the given relative expiry, failing the test on error.
+func mustSet(t *testing.T, s *Store, key, value string, expiry time.Duration) {
+	t.Helper()
+
+	if err := s.Set(t.Context(), key, []byte(value), expiry); err != nil {
+		t.Fatalf("Set(%q): %v", key, err)
+	}
+}
+
+// readExpiry returns the expiry timestamp currently stored for key, as a Get
+// would parse it. It fails the test if the bucket or expiry is missing.
+func readExpiry(t *testing.T, s *Store, key string) time.Time {
+	t.Helper()
+
+	var out time.Time
+	if err := s.bdb.View(func(tx *bbolt.Tx) error {
+		b := tx.Bucket([]byte(key))
+		if b == nil {
+			t.Fatalf("bucket %q missing", key)
+		}
+
+		expiry, err := time.Parse(time.RFC3339Nano, string(b.Get([]byte("expiry"))))
+		if err != nil {
+			return err
+		}
+		out = expiry
+		return nil
+	}); err != nil {
+		t.Fatalf("reading expiry for %q: %v", key, err)
+	}
+
+	return out
+}
+
+// rawData reads the raw data value for key directly, bypassing the expiry check
+// in Get so tests can observe whether a bucket physically exists. It returns nil
+// when the bucket is absent.
+func rawData(t *testing.T, s *Store, key string) []byte {
+	t.Helper()
+
+	var out []byte
+	if err := s.bdb.View(func(tx *bbolt.Tx) error {
+		b := tx.Bucket([]byte(key))
+		if b == nil {
+			return nil
+		}
+		data := b.Get([]byte("data"))
+		out = make([]byte, len(data))
+		copy(out, data)
+		return nil
+	}); err != nil {
+		t.Fatalf("reading data for %q: %v", key, err)
+	}
+
+	return out
+}
+
+// TestDeleteIfExpired guards against AWOO-015: a stale async delete scheduled by
+// an expired Get must not erase a value that was refreshed (or otherwise differs
+// from) the generation it observed.
+func TestDeleteIfExpired(t *testing.T) {
+	const key = "challenge"
+
+	for _, tt := range []struct {
+		setup       func(t *testing.T, s *Store) time.Time
+		name        string
+		wantValue   string
+		wantPresent bool
+	}{
+		{
+			name: "deletes the observed expired generation",
+			setup: func(t *testing.T, s *Store) time.Time {
+				mustSet(t, s, key, "old", -time.Minute)
+				return readExpiry(t, s, key)
+			},
+			wantPresent: false,
+		},
+		{
+			name: "preserves a refreshed generation",
+			setup: func(t *testing.T, s *Store) time.Time {
+				mustSet(t, s, key, "old", -time.Minute)
+				observed := readExpiry(t, s, key)
+				mustSet(t, s, key, "fresh", time.Hour)
+				return observed
+			},
+			wantPresent: true,
+			wantValue:   "fresh",
+		},
+		{
+			name: "skips on generation mismatch",
+			setup: func(t *testing.T, s *Store) time.Time {
+				mustSet(t, s, key, "old", -time.Minute)
+				// An expiry we never wrote: even though the stored value is
+				// expired, it is a different generation and must be left alone.
+				return time.Now().Add(-2 * time.Hour)
+			},
+			wantPresent: true,
+			wantValue:   "old",
+		},
+		{
+			name: "skips a non-expired observation",
+			setup: func(t *testing.T, s *Store) time.Time {
+				mustSet(t, s, key, "live", time.Hour)
+				return readExpiry(t, s, key)
+			},
+			wantPresent: true,
+			wantValue:   "live",
+		},
+		{
+			name: "no-op when bucket is absent",
+			setup: func(t *testing.T, s *Store) time.Time {
+				return time.Now().Add(-time.Hour)
+			},
+			wantPresent: false,
+		},
+	} {
+		t.Run(tt.name, func(t *testing.T) {
+			s := newTestStore(t)
+			observed := tt.setup(t, s)
+
+			if err := s.deleteIfExpired(t.Context(), key, observed); err != nil {
+				t.Fatalf("deleteIfExpired(%q): %v", key, err)
+			}
+
+			got := rawData(t, s, key)
+			switch {
+			case tt.wantPresent && got == nil:
+				t.Fatalf("key %q: want present with value %q, got deleted", key, tt.wantValue)
+			case tt.wantPresent && string(got) != tt.wantValue:
+				t.Errorf("key %q: want value %q, got %q", key, tt.wantValue, string(got))
+			case !tt.wantPresent && got != nil:
+				t.Errorf("key %q: want deleted, got value %q", key, string(got))
+			}
+		})
+	}
+}