sql server数据打乱随机 sql随机行

转载

mob64ca13f3c9f0 2024-05-09 10:57:48

文章标签 sql server数据打乱随机 sql performance postgresql random 文章分类 SQL Server 数据库

I want a random selection of rows in PostgreSQL, I tried this: 我想在PostgreSQL中随机选择一行，我试过这个：

select * from table where random() < 0.01;

But some other recommend this: 但其他一些人推荐这个：

select * from table order by random() limit 1000;

I have a very large table with 500 Million rows, I want it to be fast. 我有一个非常大的桌子，有5亿行，我希望它快。

Which approach is better? 哪种方法更好？ What are the differences? 有什么区别？ What is the best way to select random rows? 选择随机行的最佳方法是什么？

#1楼

参考：https://stackoom.com/question/AoGO/选择随机行PostgreSQL的最佳方法

#2楼

postgresql order by random(), select rows in random order: postgresql order by random（），按随机顺序选择行：

select your_columns from your_table ORDER BY random()

postgresql order by random() with a distinct: postgresql以random（）顺序排列：

select * from 
  (select distinct your_columns from your_table) table_alias
ORDER BY random()

postgresql order by random limit one row: postgresql命令随机限制一行：

select your_columns from your_table ORDER BY random() limit 1

#3楼

Say, for example, that you don't want duplicates in the randomized values that are returned. 例如，假设您不希望在返回的随机值中出现重复项。 So you will need to set a boolean value on the primary table containing your (non-randomized) set of values. 因此，您需要在包含（非随机）值集的主表上设置布尔值。

Assuming this is the input table: 假设这是输入表：

id_values  id  |   used
           ----+--------
           1   |   FALSE
           2   |   FALSE
           3   |   FALSE
           4   |   FALSE
           5   |   FALSE
           ...

Populate the ID_VALUES table as needed. 根据需要填充ID_VALUES表。 Then, as described by Erwin, create a materialized view that randomizes the ID_VALUES table once: 然后，如Erwin所述，创建一个物化视图，将ID_VALUES表随机化一次：

CREATE MATERIALIZED VIEW id_values_randomized AS
  SELECT id
  FROM id_values
  ORDER BY random();

CREATE MATERIALIZED VIEW id_values_randomized AS
  SELECT id
  FROM id_values
  ORDER BY random();

Note that the materialized view does not contain the used column, because this will quickly become out-of-date. 请注意，实例化视图不包含已使用的列，因为这将很快变得过时。 Nor does the view need to contain other columns that may be in the id_values table. 视图也不需要包含可能在id_values表中的其他列。

In order to obtain (and "consume") random values, use an UPDATE-RETURNING on id_values , selecting id_values from id_values_randomized with a join, and applying the desired criteria to obtain only relevant possibilities. 为了获得（并“消耗”）随机值，在id_values上使用UPDATE- id_values ，从连接中选择id_values的id_values_randomized ，并应用所需的条件以仅获得相关的可能性。 For example: 例如：

UPDATE id_values
SET used = TRUE
WHERE id_values.id IN 
  (SELECT i.id
    FROM id_values_randomized r INNER JOIN id_values i ON i.id = r.id
    WHERE (NOT i.used)
    LIMIT 5)
RETURNING id;

UPDATE id_values
SET used = TRUE
WHERE id_values.id IN 
  (SELECT i.id
    FROM id_values_randomized r INNER JOIN id_values i ON i.id = r.id
    WHERE (NOT i.used)
    LIMIT 5)
RETURNING id;

Change LIMIT as necessary -- if you only need one random value at a time, change LIMIT to 1 . 根据需要更改LIMIT - 如果一次只需要一个随机值，则将LIMIT更改为1 。

With the proper indexes on id_values , I believe the UPDATE-RETURNING should execute very quickly with little load. 使用id_values上的正确索引，我相信UPDATE-RETURNING应该在很少负载的情况下快速执行。 It returns randomized values with one database round-trip. 它返回一个数据库往返的随机值。 The criteria for "eligible" rows can be as complex as required. “符合条件”行的标准可以根据需要复杂化。 New rows can be added to the id_values table at any time, and they will become accessible to the application as soon as the materialized view is refreshed (which can likely be run at an off-peak time). 可以随时将新行添加到id_values表中， id_values化视图（可能在非高峰时间运行），它们就可以被应用程序访问。 Creation and refresh of the materialized view will be slow, but it only needs to be executed when new id's are added to the id_values table. 物化视图的创建和刷新将很慢，但只有在将新ID添加到id_values表时才需要执行。

#4楼

If you want just one row, you can use a calculated offset derived from count . 如果只需要一行，则可以使用从count派生的计算offset 。

select * from table_name limit 1
offset floor(random() * (select count(*) from table_name));

select * from table_name limit 1
offset floor(random() * (select count(*) from table_name));

#5楼

Add a column called r with type serial . 添加名为r的列，类型为serial 。 Index r . 指数r 。

Assume we have 200,000 rows, we are going to generate a random number n , where 0 < n <= 200, 000. 假设我们有200,000行，我们将生成一个随机数n ，其中0 < n <= 200,000。

Select rows with r > n , sort them ASC and select the smallest one. 选择r > n行，将它们排序为ASC并选择最小的行。

Code: 码：

select * from YOUR_TABLE 
where r > (
    select (
        select reltuples::bigint AS estimate
        from   pg_class
        where  oid = 'public.YOUR_TABLE'::regclass) * random()
    )
order by r asc limit(1);

In application level you need to execute the statement again if n > the number of rows or need to select multiple rows. 在应用程序级别，如果n >行数或需要选择多行，则需要再次执行该语句。

#6楼

Starting with PostgreSQL 9.5, there's a new syntax dedicated to getting random elements from a table : 从PostgreSQL 9.5开始，有一种新的语法专用于从表中获取随机元素：

SELECT * FROM mytable TABLESAMPLE SYSTEM (5);

This example will give you 5% of elements from mytable . 这个例子将为mytable提供5％的元素。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：unity 游戏代码混淆 unity官方代码教程

下一篇：iOS 网络请求NSMutableURLRequest设置 ios网络请求原理

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯