I have a series of values in a database that I need to pull to create a line chart. Because i dont require high resolution I would like to resample the data by selecting every 5th row from the database.
Since you said you're using MySQL, you can use user variables to create a continuous row numbering. You do have to put that in a derived table (subquery) though.
SET @x := 0;
SELECT *
FROM (SELECT (@x:=@x+1) AS x, mt.* FROM mytable mt ORDER BY RAND()) t
WHERE x MOD 5 = 0;
I added ORDER BY RAND() to get a pseudorandom sampling, instead of allowing every fifth row of the unordered table to be in the sample every time.
An anonymous user tried to edit this to change x MOD 5 = 0 to x MOD 5 = 1. I have changed it back to my original.
For the record, one can use any value between 0 and 4 in that condition, and there's no reason to prefer one value over another.
I had been looking for something like this. The answer of Taylor and Bill led me to improve upon their ideas.
table data1 has fields read_date, value
we want to select every 2d record from a query limited by a read_date range
the name of the derived table is arbitrary and here is called DT
query:
SET @row := 0;
SELECT * FROM ( SELECT @row := @row +1 AS rownum, read_date, value FROM data1
WHERE read_date>= 1279771200 AND read_date <= 1281844740 ) as DT WHERE MOD(rownum,2)=0
WITH ordering AS (
SELECT ROW_NUMBER() OVER (ORDER BY name) AS n, example.*
FROM example ORDER BY name
)
SELECT * FROM ordering WHERE MOD(n, 5) = 0;
Conceptually, this creates a temporary table with the contents of the example table ordered by the name field, adds an additional field called n which is the row number, and then fetches only those rows with numbers which are exactly divisible by 5, i.e. every 5th row. In practice, the database engine is often able to optimise this better than that. But even if it doesn't optimise it any further, I think it's clearer than using user variables iteratively as you had to in earlier versions of MySQL.