I know that PostgreSQL, SQLite, MySQL, and Oracle have the ability to order by a random function, so you can use this in SQLAlchemy:
from sqlalchemy.sql.expression import func, select
select.order_by(func.random()) # for PostgreSQL, SQLite
select.order_by(func.rand()) # for MySQL
select.order_by('dbms_random.value') # For Oracle
Next, you need to limit the query by the number of records you need (for example using .limit()).
Bear in mind that at least in PostgreSQL, selecting random record has severe perfomance issues; here is good article about it.
If you are using the orm and the table is not big (or you have its amount of rows cached) and you want it to be database independent the really simple approach is.
import random
rand = random.randrange(0, session.query(Table).count())
row = session.query(Table)[rand]
This is cheating slightly but thats why you use an orm.
Where Table is your table (or you could put any query there).
If you want a few rows, then you can just run this multiple times, and make sure that each row is not identical to the previous.
You can easily see that using func.random() is far faster than returning all results to Python's random.choice().
Additionally, as the size of the table increases, the performance of order_by_random() will degrade significantly because an ORDER BY requires a full table scan versus the COUNT in optimized_random() can use an index.
from random import randint
rows_query = session.query(Table) # get all rows
if rows_query.count() > 0: # make sure there's at least 1 row
rand_index = randint(0,rows_query.count()-1) # get random index to rows
rand_row = rows_query.all()[rand_index] # use random index to get random row
Some SQL DBMS, namely Microsoft SQL Server, DB2, and PostgreSQL have implemented the SQL:2003 TABLESAMPLE clause. Support was added to SQLAlchemy in version 1.1. It allows returning a sample of a table using different sampling methods – the standard requires SYSTEM and BERNOULLI, which return a desired approximate percentage of a table.
# Approx. 1%, using SYSTEM method
sample1 = mytable.tablesample(1)
# Approx. 1%, using BERNOULLI method
sample2 = mytable.tablesample(func.bernoulli(1))
There's a slight gotcha when used with mapped classes: the produced TableSample object must be aliased in order to be used to query model objects:
sample = aliased(MyModel, tablesample(MyModel, 1))
res = session.query(sample).all()
Since many of the answers contain performance benchmarks, I'll include some simple tests here as well. Using a simple table in PostgreSQL with about a million rows and a single integer column, select (approx.) 1% sample:
In [24]: %%timeit
...: foo.select().\
...: order_by(func.random()).\
...: limit(select([func.round(func.count() * 0.01)]).
...: select_from(foo).
...: as_scalar()).\
...: execute().\
...: fetchall()
...:
307 ms ± 5.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [25]: %timeit foo.tablesample(1).select().execute().fetchall()
6.36 ms ± 188 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [26]: %timeit foo.tablesample(func.bernoulli(1)).select().execute().fetchall()
19.8 ms ± 381 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Before rushing to use SYSTEM sampling method one should know that it samples pages, not individual tuples, so it might not be suitable for small tables, for example, and may not produce as random results, if the table is clustered.
If using a dialect that does not allow passing the sample percentage / number of rows and seed as parameters, and a driver that does not inline values, then either pass the values as literal SQL text if they are static, or inline them using a custom SQLA compiler extension:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql import TableSample
@compiles(TableSample)
def visit_tablesample(tablesample, self, asfrom=False, **kw):
""" Compile `TableSample` with values inlined.
"""
kw_literal_binds = {**kw, "literal_binds": True}
text = "%s TABLESAMPLE %s" % (
self.visit_alias(tablesample, asfrom=True, **kw),
tablesample._get_method()._compiler_dispatch(self, **kw_literal_binds),
)
if tablesample.seed is not None:
text += " REPEATABLE (%s)" % (
tablesample.seed._compiler_dispatch(self, **kw_literal_binds)
)
return text
from sqlalchemy import table, literal, text
# Static percentage
print(table("tbl").tablesample(text("5 PERCENT")))
# Compiler inlined values
print(table("tbl").tablesample(5, seed=literal(42)))
Use this simplest method
this example on choosing a random question from the database:-
#first import the random module
import random
#then choose what ever Model you want inside random.choise() method
get_questions = random.choice(Question.query.all())