thatarif
Go back

Generating timeseries data in postgresql

Apr 7, 2025

While building a feature for a web analytics project, I wanted an api which returns all the events occurred between startdate and enddate on daily basis. If there is no event on any day, the value should be 0.

Exploring postgres, I found the use of generate_series function. It can be used to generate

  1. Numeric series

    generate_series(start, stop)
    generate_series(start, stop, step)
    
    • start: The first value in the series (inclusive). Can be integer, bigint, or numeric.
    • stop: The last potential value in the series (inclusive). Must be the same type as start.
    • step (Optional): The increment between values. Defaults to 1 if omitted. Must be the same type as start and stop. Can be positive or negative.
    select generate_series(1,10) as index; -- returns 10 rows of value from 1 to 10
    
  2. Timestamp series

    generate_series(start, stop, step)
    
    • start: The starting timestamp (inclusive). Can be timestamp, timestamptz, or date (though using intervals with date is common).
    • stop: The ending timestamp (inclusive). Must be compatible with start.
    • step: The interval between timestamps. Must be of type interval.
    SELECT generate_series(
        '2025-04-01'::date,
        '2025-04-30'::date,
        interval '1 day'
    )::date AS report_date;
    

MSSQL Server also has similar function to generate these type of data, while for other DBMS RECURSIVE CTE can be used for such data.

Real use-case

Filling Gaps in Time-Series Data (using left join with actual data)

Imagine you have an events table logging occurrences with a created_at timestamp. To analyze trends over time, a common requirement is to aggregate these events daily. Specifically, we want to count how many pagevisits occurred and visitors visited the page within a given date range.

With the help of generate_series and using CTE, we can achieve this easily.

WITH date_series AS (
  SELECT generate_series(
    '2025-03-20'::date,
    '2025-03-27'::date,
    interval '1 day'
  )::date AS day
)
SELECT
  TO_CHAR(ds.day, 'YYYY-MM-DD') AS key,
  COUNT(DISTINCT evt.ip) AS visitors,
  COUNT(evt.id) AS pagevisits
FROM date_series ds
LEFT JOIN events evt ON DATE(evt.createdat) = ds.day
GROUP BY ds.day
ORDER BY ds.day;

It will return following rows

keyvisitorspagevisits
2025-03-2000
2025-03-21210
2025-03-2200
2025-03-2300
2025-03-24320
2025-03-2500
2025-03-26127
2025-03-2700

As demonstrated, generate_series is a remarkably versatile function in PostgreSQL. Whether you’re filling gaps in time-series data, creating complete periodic reports, or generating test data, it offers a concise and efficient way to create sequential data directly within your SQL queries, simplifying complex reporting and analysis tasks.

thatarif