environmental-rose

Pattern: Populating a collection "as needed" with multiple Queries throughout app - no "base" query.

This thread explores using various useQuery hooks throughout the app to "feed" our collections, only adding records as they're called for throughout the app. We don't want to force the user to load entire db tables since 95% of the time they will only use a small % of them. We do want to progressively build up a store and use it's querying methods to performantly select from the data that has been fetched. As of 7/31 we have "manual sync update methods" per this issue: https://github.com/TanStack/db/issues/294 Though I'm not certain that quite addresses our goal, as it still seems to require the collection to be build on a default query. Within our app, we can't be certain which query will be called upon first to begin populating a given collection. In theory I'd like to do this:

import { dispatchCollection, vehicleDispatchCollection, workOrderCollection } from '@/db/dispatch.collections'

export const useDbDispatchDateQuery = (date: string) => {
  return useQuery({
    queryKey: [GET_DB_DISPATCH_DATE_QUERY_KEY, { date }],
    queryFn: async () => {
      const result = await mainGraphQLClient.request(GET_DB_DISPATCH_DATE, { date })

      if (!result.dbDispatchesForDispatchDate) return
      const { dispatches, vehicleDispatches, workOrders } = result.dbDispatchesForDispatchDate

      dispatchCollection.utils.syncUpsert(dispatches)
      vehicleDispatchCollection.utils.syncUpsert(vehicleDispatches)
      workOrderCollection.utils.syncUpsert(workOrders)

      return result.dbDispatchesForDispatchDate
    },
    enabled: !!date,
  })
}

import { dispatchCollection, vehicleDispatchCollection, workOrderCollection } from '@/db/dispatch.collections'

export const useDbDispatchDateQuery = (date: string) => {
  return useQuery({
    queryKey: [GET_DB_DISPATCH_DATE_QUERY_KEY, { date }],
    queryFn: async () => {
      const result = await mainGraphQLClient.request(GET_DB_DISPATCH_DATE, { date })

      if (!result.dbDispatchesForDispatchDate) return
      const { dispatches, vehicleDispatches, workOrders } = result.dbDispatchesForDispatchDate

      dispatchCollection.utils.syncUpsert(dispatches)
      vehicleDispatchCollection.utils.syncUpsert(vehicleDispatches)
      workOrderCollection.utils.syncUpsert(workOrders)

      return result.dbDispatchesForDispatchDate
    },
    enabled: !!date,
  })
}

...with the hope that it would update any records with a matching key (unique ID), and insert any new records. I'm already seeing a few related questions/issues/discord messages around a similar topic, so hopefully this thread can be a good centralized place to discuss the optimal approach for this. Pinging @Kyle Mathews since he helpfully pointed out the manual sync update.

GitHub

Add Manual Sync Updates API to @tanstack/query-db-collection · Iss...

Background The @tanstack/query-db-collection package integrates TanStack Query with TanStack DB collections, providing automatic synchronization between query results and collection state. Currentl...

24 Replies

conscious-sapphire•4mo ago

if you want to lazily load data as needed — check out and comment on these proposals https://github.com/TanStack/db/issues/343 & https://github.com/TanStack/db/issues/315

GitHub

Paginated / Infinite Collections · Issue #343 · TanStack/db

A common request we are receiving is to lazily load data into a "query collection" using the infinite query pattern. We need to consider how to support this in a way that is then useable ...

GitHub

Partitioned collections · Issue #315 · TanStack/db

A very common use case, and question, is how to handle collections where you don't want to download all of it. Such as issues in an issue tracker, downloading by project/status/createdData etc....

environmental-roseOP•4mo ago

Reading them now. Also you mentioned a "derived" collection that joins the query cache of multiple queries. That sounds like it could work, but I'm not seeing anything in the docs about derived collections or how to build them?

conscious-sapphire•4mo ago

also you don't want to manually dispatch stuff like that as a normal course queries are derived collections

conscious-sapphire•4mo ago

https://tanstack.com/db/latest/docs/overview#derived-collections

Overview | TanStack DB Docs

TanStack DB Documentation Welcome to the TanStack DB documentation. TanStack DB is a reactive client store for building super fast apps on sync. It extends TanStack Query with collections, live querie...

environmental-roseOP•4mo ago

Oh, I was thinking react query queries, not live queries.

conscious-sapphire•4mo ago

const dispatchCollection = createLiveQueryCollection({
  startSync: true,
  query: (q) =>
    q.from({ todo: graphqlCollection }).where(({ item }) => eq(item.type, `dispatch`)),
})

const dispatchCollection = createLiveQueryCollection({
  startSync: true,
  query: (q) =>
    q.from({ todo: graphqlCollection }).where(({ item }) => eq(item.type, `dispatch`)),
})

environmental-roseOP•4mo ago

So that shows deriving from imported collection instances, not react query queries - does that mean that I should replace my useQuery hooks with their own createCollection( queryCollectionOptions({}) ) and then use the pattern you just shared to extract the relevant data types from each of them?

conscious-sapphire•4mo ago

yup! Just convert the response into a flat array & w/ a type field of some sort and then you can easily spilt them into their own collections

environmental-roseOP•4mo ago

So when multiple collections have a given ID, presumably I'll need to write custom logic so it knows how to reconcile which one should win? (probably just comparing "modifiedAt" in most cases)

conscious-sapphire•4mo ago

what collections are you talking about?

environmental-roseOP•4mo ago

Let's say I have three queries throughout the app that all fetch datasets that include Dispatch records, and though their filtersets are unique, they likely all contain some overlap - same database rows, but fetched at different times. The derived collection would be trying to combine all Dispatch records from all three query collections, and would see the same unique ID up to three times.

conscious-sapphire•4mo ago

ah ok, then you could do a full outer join then to merge the overlapping collections

environmental-roseOP•4mo ago

Hmm, is there a way to create a queryCollectionOptions with a dynamic param like date? Right now both enabled and queryKey can't access date.

export const dbDispatchDateQueryCollection = createCollection(
  queryCollectionOptions({
    queryKey: [GET_DB_DISPATCH_DATE_QUERY_KEY, { date }],
    queryFn: async ({ date }: { date: string }) => {
      const result = await mainGraphQLClient.request(GET_DB_DISPATCH_DATE, { date })

      return result.dbDispatchesForDispatchDate
    },
    getKey: item => item.id,
    enabled: !!date,
  })
)

export const dbDispatchDateQueryCollection = createCollection(
  queryCollectionOptions({
    queryKey: [GET_DB_DISPATCH_DATE_QUERY_KEY, { date }],
    queryFn: async ({ date }: { date: string }) => {
      const result = await mainGraphQLClient.request(GET_DB_DISPATCH_DATE, { date })

      return result.dbDispatchesForDispatchDate
    },
    getKey: item => item.id,
    enabled: !!date,
  })
)

conscious-sapphire•4mo ago

Make a factory function?

environmental-roseOP•4mo ago

Wouldn't that mean that the collections are destroyed when the component that calls them for a given date are un-mounted? Edit: Apparently not. AI studio has informed me that I am dumb. So is this the right idea? (aside from the goofy short gcTime)

// src/lib/createCollectionFactory.ts (NEW UTILITY FILE)
import { Collection } from '@tanstack/react-db';

// The TTL should be slightly longer than the collection's gcTime to avoid race conditions.
const CACHE_TTL = 6000; // 6 seconds for a 5-second gcTime

// This is a higher-order function: a function that creates a factory function.
export function createCollectionFactory<TItem, TParams extends string | number>(
  creatorFn: (params: TParams) => Collection<TItem>
) {
  const cache = new Map<TParams, Collection<TItem>>();
  const timeouts = new Map<TParams, NodeJS.Timeout>();

  return (params: TParams): Collection<TItem> => {
    // If a cleanup timer was set for this key, cancel it because we're using it again.
    if (timeouts.has(params)) {
      clearTimeout(timeouts.get(params));
      timeouts.delete(params);
    }

    // If the collection is already in the cache, return it.
    if (cache.has(params)) {
      return cache.get(params)!;
    }

    // Otherwise, create a new collection instance.
    const newCollection = creatorFn(params);
    cache.set(params, newCollection);

    // IMPORTANT: Listen for when the collection is cleaned up by Tanstack DB's gcTime.
    // When it is, we remove it from our factory's cache to prevent memory leaks.
    const unsubscribe = newCollection.subscribe(() => {
      if (newCollection.status === 'cleaned-up') {
        cache.delete(params);
        timeouts.delete(params); // Clean up any stray timeout
        unsubscribe(); // Clean up the subscription itself
      }
    });

    return newCollection;
  };
}

// src/lib/createCollectionFactory.ts (NEW UTILITY FILE)
import { Collection } from '@tanstack/react-db';

// The TTL should be slightly longer than the collection's gcTime to avoid race conditions.
const CACHE_TTL = 6000; // 6 seconds for a 5-second gcTime

// This is a higher-order function: a function that creates a factory function.
export function createCollectionFactory<TItem, TParams extends string | number>(
  creatorFn: (params: TParams) => Collection<TItem>
) {
  const cache = new Map<TParams, Collection<TItem>>();
  const timeouts = new Map<TParams, NodeJS.Timeout>();

  return (params: TParams): Collection<TItem> => {
    // If a cleanup timer was set for this key, cancel it because we're using it again.
    if (timeouts.has(params)) {
      clearTimeout(timeouts.get(params));
      timeouts.delete(params);
    }

    // If the collection is already in the cache, return it.
    if (cache.has(params)) {
      return cache.get(params)!;
    }

    // Otherwise, create a new collection instance.
    const newCollection = creatorFn(params);
    cache.set(params, newCollection);

    // IMPORTANT: Listen for when the collection is cleaned up by Tanstack DB's gcTime.
    // When it is, we remove it from our factory's cache to prevent memory leaks.
    const unsubscribe = newCollection.subscribe(() => {
      if (newCollection.status === 'cleaned-up') {
        cache.delete(params);
        timeouts.delete(params); // Clean up any stray timeout
        unsubscribe(); // Clean up the subscription itself
      }
    });

    return newCollection;
  };
}

Then use it like this:

// src/features/dispatch/dispatch.source-collections.ts (REVISED FACTORY)
import { createCollection } from '@tanstack/react-db';
import { queryCollectionOptions } from '@tanstack/query-db-collection';
import { createCollectionFactory } from '@/lib/createCollectionFactory'; // <-- IMPORT THE NEW UTILITY

// ... other imports and type definitions ...
type SourceItem = DispatchType | VehicleDispatchType | WorkOrderType;

// --- Factory for the Date-based Query ---
export const getSourceCollectionByDate = createCollectionFactory((date: string) => {
  return createCollection(
    queryCollectionOptions<SourceItem>({
      queryKey: ['dbDispatchDate', { date }],
      queryFn: async () => { /* ... fetch and flatten data ... */ },
      getKey: (item) => `${item.__typename}:${item.id}`,
      // You can configure gcTime here if you want it to be longer or shorter than 5s
      // gcTime: 30000, // e.g., 30 seconds
    })
  );
});

// --- Factory for the Job-based Query ---
export const getSourceCollectionByJob = createCollectionFactory((jobId: string) => {
  return createCollection(
    queryCollectionOptions<SourceItem>({
      queryKey: ['dispatchesByJob', { jobId }],
      queryFn: async () => { /* ... fetch and flatten data ... */ },
      getKey: (item) => `${item.__typename}:${item.id}`,
    })
  );
});

// src/features/dispatch/dispatch.source-collections.ts (REVISED FACTORY)
import { createCollection } from '@tanstack/react-db';
import { queryCollectionOptions } from '@tanstack/query-db-collection';
import { createCollectionFactory } from '@/lib/createCollectionFactory'; // <-- IMPORT THE NEW UTILITY

// ... other imports and type definitions ...
type SourceItem = DispatchType | VehicleDispatchType | WorkOrderType;

// --- Factory for the Date-based Query ---
export const getSourceCollectionByDate = createCollectionFactory((date: string) => {
  return createCollection(
    queryCollectionOptions<SourceItem>({
      queryKey: ['dbDispatchDate', { date }],
      queryFn: async () => { /* ... fetch and flatten data ... */ },
      getKey: (item) => `${item.__typename}:${item.id}`,
      // You can configure gcTime here if you want it to be longer or shorter than 5s
      // gcTime: 30000, // e.g., 30 seconds
    })
  );
});

// --- Factory for the Job-based Query ---
export const getSourceCollectionByJob = createCollectionFactory((jobId: string) => {
  return createCollection(
    queryCollectionOptions<SourceItem>({
      queryKey: ['dispatchesByJob', { jobId }],
      queryFn: async () => { /* ... fetch and flatten data ... */ },
      getKey: (item) => `${item.__typename}:${item.id}`,
    })
  );
});

Then to create a unified live query, we'd need to manage the generated collections in global state, right? eg.

// src/hooks/useRegisterSourceCollection.ts
// ...imports

export type SourceItem = DispatchType | VehicleDispatchType | WorkOrderType;
export const activeSourceCollectionsAtom = atom<Collection<SourceItem>[]>([]);

// A module-level map to keep track of removal timers for each collection instance.
const removalTimers = new Map<Collection<SourceItem>, NodeJS.Timeout>();

export const useRegisterSourceCollection = (collection: Collection<SourceItem> | null) => {
  const setActiveSourceCollections = useSetAtom(activeSourceCollectionsAtom);

  useEffect(() => {
    if (!collection) return;

    // Check if a removal timer is pending for this collection.
    // If so, the user has returned to a view using this collection before it was GC'd.
    if (removalTimers.has(collection)) {
      // Cancel the pending removal.
      clearTimeout(removalTimers.get(collection)!);
      removalTimers.delete(collection);
    }

    // Add the collection to the global registry if it's not already there.
    setActiveSourceCollections((prev) => {
      if (prev.includes(collection)) {
        return prev;
      }
      return [...prev, collection];
    });

    // On unmount, schedule the collection for removal from the global registry.
    return () => {
      // Set a timer to remove the collection after the delay.
      const timerId = setTimeout(() => {
        console.log(`Removing collection from active registry due to gcTime expiration...`);
        setActiveSourceCollections((prev) => prev.filter((c) => c !== collection));
        removalTimers.delete(collection);
      }, REMOVAL_DELAY_MS);

      removalTimers.set(collection, timerId);
    };
  }, [collection, setActiveSourceCollections]);
};

// src/hooks/useRegisterSourceCollection.ts
// ...imports

export type SourceItem = DispatchType | VehicleDispatchType | WorkOrderType;
export const activeSourceCollectionsAtom = atom<Collection<SourceItem>[]>([]);

// A module-level map to keep track of removal timers for each collection instance.
const removalTimers = new Map<Collection<SourceItem>, NodeJS.Timeout>();

export const useRegisterSourceCollection = (collection: Collection<SourceItem> | null) => {
  const setActiveSourceCollections = useSetAtom(activeSourceCollectionsAtom);

  useEffect(() => {
    if (!collection) return;

    // Check if a removal timer is pending for this collection.
    // If so, the user has returned to a view using this collection before it was GC'd.
    if (removalTimers.has(collection)) {
      // Cancel the pending removal.
      clearTimeout(removalTimers.get(collection)!);
      removalTimers.delete(collection);
    }

    // Add the collection to the global registry if it's not already there.
    setActiveSourceCollections((prev) => {
      if (prev.includes(collection)) {
        return prev;
      }
      return [...prev, collection];
    });

    // On unmount, schedule the collection for removal from the global registry.
    return () => {
      // Set a timer to remove the collection after the delay.
      const timerId = setTimeout(() => {
        console.log(`Removing collection from active registry due to gcTime expiration...`);
        setActiveSourceCollections((prev) => prev.filter((c) => c !== collection));
        removalTimers.delete(collection);
      }, REMOVAL_DELAY_MS);

      removalTimers.set(collection, timerId);
    };
  }, [collection, setActiveSourceCollections]);
};

and THEN unify with a live query:

// src/features/dispatch/hooks/useUnifiedDispatches.ts
// ...imports 

export const useUnifiedDispatches = () => {
  // 1. Subscribe to the list of active source collections
  const activeSources = useAtomValue(activeSourceCollectionsAtom);

  // 2. Create a live query that depends on the list of active sources
  const { data, ...rest } = useLiveQuery(
    (q) => {
      // Handle edge cases: no sources active yet
      if (activeSources.length === 0) {
        return q.from({ empty: [] }).select(() => ({} as DispatchType));
      }

      // Dynamically build the query with full outer joins
      let query = q.from({ s0: activeSources[0] });
      const aliases = ['s0'];

      // Chain full joins for all other active sources
      for (let i = 1; i < activeSources.length; i++) {
        const alias = `s${i}`;
        aliases.push(alias);
        query = query.fullJoin({ [alias]: activeSources[i] }, (row) =>
          // Join on our unique composite key
          eq(row.s0.id, row[alias].id)
        );
      }

      // Merge the results and filter for only Dispatches
      return query
        .select((row) => {
          // Coalesce finds the first non-null value, effectively merging the rows.
          // We reverse the aliases to prioritize sources added later.
          const aliasedSources = aliases.reverse().map(alias => row[alias]);
          return coalesce(...aliasedSources) as SourceItem;
        })
        .where((merged) => eq(merged.__typename, 'Dispatch'))
        .select((merged) => merged as DispatchType); // Final cast to the correct type
    },
    [activeSources] // CRITICAL: Rerun the query builder when the list of sources changes
  );

  return { data: data ?? [], ...rest };
};

// src/features/dispatch/hooks/useUnifiedDispatches.ts
// ...imports 

export const useUnifiedDispatches = () => {
  // 1. Subscribe to the list of active source collections
  const activeSources = useAtomValue(activeSourceCollectionsAtom);

  // 2. Create a live query that depends on the list of active sources
  const { data, ...rest } = useLiveQuery(
    (q) => {
      // Handle edge cases: no sources active yet
      if (activeSources.length === 0) {
        return q.from({ empty: [] }).select(() => ({} as DispatchType));
      }

      // Dynamically build the query with full outer joins
      let query = q.from({ s0: activeSources[0] });
      const aliases = ['s0'];

      // Chain full joins for all other active sources
      for (let i = 1; i < activeSources.length; i++) {
        const alias = `s${i}`;
        aliases.push(alias);
        query = query.fullJoin({ [alias]: activeSources[i] }, (row) =>
          // Join on our unique composite key
          eq(row.s0.id, row[alias].id)
        );
      }

      // Merge the results and filter for only Dispatches
      return query
        .select((row) => {
          // Coalesce finds the first non-null value, effectively merging the rows.
          // We reverse the aliases to prioritize sources added later.
          const aliasedSources = aliases.reverse().map(alias => row[alias]);
          return coalesce(...aliasedSources) as SourceItem;
        })
        .where((merged) => eq(merged.__typename, 'Dispatch'))
        .select((merged) => merged as DispatchType); // Final cast to the correct type
    },
    [activeSources] // CRITICAL: Rerun the query builder when the list of sources changes
  );

  return { data: data ?? [], ...rest };
};

Then finally create/use it:

import { useLiveQuery } from '@tanstack/react-db';
import { getSourceCollectionByDate } from '../dispatch.source-collections';
import { useRegisterSourceCollection } from '../hooks/useRegisterSourceCollection';
import { useUnifiedDispatches } from '../hooks/useUnifiedDispatches';

function DispatchScreen({ dateString }) {
  // 1. Get the source collection instance for this specific date from the factory.
  const sourceCollection = getSourceCollectionByDate(dateString);

  // 2. Register this source collection with our global store.
  //    It will be automatically removed when this component unmounts.
  useRegisterSourceCollection(sourceCollection);
  
  // 3. Any component in the app can now use this hook to get ALL dispatches.
  const { data: allKnownDispatches, isLoading } = useUnifiedDispatches();

  // You can still perform additional client-side filtering if needed for the view
  const dispatchesForThisDate = allKnownDispatches.filter(d => d.date === dateString);

  if (isLoading) return <Spinner />;
  
  // ... render your UI with `dispatchesForThisDate`
}

// In your Job Details Component
function JobDetails({ jobId }) {
  // 1. This component gets and registers a DIFFERENT source collection.
  const sourceCollection = getSourceCollectionByJob(jobId);
  useRegisterSourceCollection(sourceCollection);

  // 2. It can also use the SAME unified hook to get all known dispatches.
  const { data: allKnownDispatches } = useUnifiedDispatches();
  
  // This component's data need is different, so it filters differently.
  const dispatchesForThisJob = allKnownDispatches.filter(d => d.workOrder?.jobId === jobId);

  // ... render UI
}

import { useLiveQuery } from '@tanstack/react-db';
import { getSourceCollectionByDate } from '../dispatch.source-collections';
import { useRegisterSourceCollection } from '../hooks/useRegisterSourceCollection';
import { useUnifiedDispatches } from '../hooks/useUnifiedDispatches';

function DispatchScreen({ dateString }) {
  // 1. Get the source collection instance for this specific date from the factory.
  const sourceCollection = getSourceCollectionByDate(dateString);

  // 2. Register this source collection with our global store.
  //    It will be automatically removed when this component unmounts.
  useRegisterSourceCollection(sourceCollection);
  
  // 3. Any component in the app can now use this hook to get ALL dispatches.
  const { data: allKnownDispatches, isLoading } = useUnifiedDispatches();

  // You can still perform additional client-side filtering if needed for the view
  const dispatchesForThisDate = allKnownDispatches.filter(d => d.date === dateString);

  if (isLoading) return <Spinner />;
  
  // ... render your UI with `dispatchesForThisDate`
}

// In your Job Details Component
function JobDetails({ jobId }) {
  // 1. This component gets and registers a DIFFERENT source collection.
  const sourceCollection = getSourceCollectionByJob(jobId);
  useRegisterSourceCollection(sourceCollection);

  // 2. It can also use the SAME unified hook to get all known dispatches.
  const { data: allKnownDispatches } = useUnifiedDispatches();
  
  // This component's data need is different, so it filters differently.
  const dispatchesForThisJob = allKnownDispatches.filter(d => d.workOrder?.jobId === jobId);

  // ... render UI
}

Though that's a lot of hoops to jump through just to get to use the querying/selector capability of db. To avoid the factories, I'm creating a "default" query for each table that gets the most common recent records - they'll all run at launch. Then I'll use the new utils.syncUpsert function to add the "as needed" results from my various useQuery hooks.

conscious-sapphire•4mo ago

Not at computer still hard to give this a close read -- but yeah just directly writing out could definitely be easier Using a weakmap is probably easier than doing manual cleanup Once nothing is using it it'd get GCed

environmental-roseOP•4mo ago

I commented with an attempt at implementing the WeakMap on the partitioned collections issue: https://github.com/TanStack/db/issues/315#issuecomment-3145541811

GitHub

Partitioned collections · Issue #315 · TanStack/db

A very common use case, and question, is how to handle collections where you don't want to download all of it. Such as issues in an issue tracker, downloading by project/status/createdData etc....

conscious-sapphire•4mo ago

Nice! Yeah that looks right the pattern you want

environmental-roseOP•4mo ago

Is it correct to interpret the lack of examples in the docs as "we really want to nudge you in the direction of downloading it all to the client and selecting client-side"?

conscious-sapphire•4mo ago

no just working on official patterns — see https://github.com/TanStack/db/issues/315 & https://github.com/TanStack/db/issues/343

GitHub

Partitioned collections · Issue #315 · TanStack/db

A very common use case, and question, is how to handle collections where you don't want to download all of it. Such as issues in an issue tracker, downloading by project/status/createdData etc....

GitHub

Paginated / Infinite Collections · Issue #343 · TanStack/db

A common request we are receiving is to lazily load data into a "query collection" using the infinite query pattern. We need to consider how to support this in a way that is then useable ...

environmental-roseOP•4mo ago

Yeah, I see that the desire for the functionality is acknowledged in those. I'm more trying to put my brain in the frame of mind as the devs/creators to fully grasp the intent of how and why it's been designed the way that it has.

conscious-sapphire•4mo ago

well it's just do one thing at a time 😆

environmental-roseOP•4mo ago

in terms of expanding the library's features?

conscious-sapphire•4mo ago

right nothing is born fully formed

Gaming

Programming

Pattern: Populating a collection "as needed" with multiple Queries throughout app - no "base" query.

Did you find this page helpful?