Postgresql 中文操作指南
F.39. seg — a datatype for line segments or floating point intervals #
此模块实现了表示线段或浮点间隔的数据类型 seg。_seg_可以在区间端点表示不确定性,使其在表示实验室测量时特别有用。
This module implements a data type seg for representing line segments, or floating point intervals. seg can represent uncertainty in the interval endpoints, making it especially useful for representing laboratory measurements.
此模块被认为是“受信任的”,也就是说,它可以由在当前数据库上具有 CREATE 权限的非超级用户安装。
This module is considered “trusted”, that is, it can be installed by non-superusers who have CREATE privilege on the current database.
F.39.1. Rationale #
测量结果的几何形状通常比数值连续统一体中的一个点的几何形状更加复杂。测量结果通常是该连续区间的某个分支,其边界有点模糊。测量结果作为区间出现,原因在于存在模糊性和随机性,也可能是因为正在测量的值本来就是一个区间,它表示了某种条件,比如蛋白质的稳定温度范围。
The geometry of measurements is usually more complex than that of a point in a numeric continuum. A measurement is usually a segment of that continuum with somewhat fuzzy limits. The measurements come out as intervals because of uncertainty and randomness, as well as because the value being measured may naturally be an interval indicating some condition, such as the temperature range of stability of a protein.
仅凭常识,将这种数据存储为区间而非数字对似乎更加方便。在实践中,在大多数应用中,这样做甚至会更加高效。
Using just common sense, it appears more convenient to store such data as intervals, rather than pairs of numbers. In practice, it even turns out more efficient in most applications.
进一步遵循常识思路,边界模糊性表明使用传统的数字数据类型会导致某些信息丢失。考虑一下这种情况:你的仪器读数为 6.50,你将此读数输入数据库。当你获取它时,你会得到什么?请看:
Further along the line of common sense, the fuzziness of the limits suggests that the use of traditional numeric data types leads to a certain loss of information. Consider this: your instrument reads 6.50, and you input this reading into the database. What do you get when you fetch it? Watch:
test=> select 6.50 :: float8 as "pH";
pH
---
6.5
(1 row)
在测量领域中,6.50 与 6.5 不同。有时可能会有根本性的不同。实验人员通常会写下(并发布)他们信任的数字。6.50 实际上是包含在更大且更模糊的区间 6.5 中的模糊区间,它们之间的唯一共同特征是(可能)它们的中心点。我们当然不希望这些不同的数据项看起来是一样的。
In the world of measurements, 6.50 is not the same as 6.5. It may sometimes be critically different. The experimenters usually write down (and publish) the digits they trust. 6.50 is actually a fuzzy interval contained within a bigger and even fuzzier interval, 6.5, with their center points being (probably) the only common feature they share. We definitely do not want such different data items to appear the same.
结论?最好拥有特殊的数据类型,以尽可能可变的精度记录区间的边界。可变的意思是,每个数据元素都记录自己的精度。
Conclusion? It is nice to have a special data type that can record the limits of an interval with arbitrarily variable precision. Variable in the sense that each data element records its own precision.
请查看以下内容:
Check this out:
test=> select '6.25 .. 6.50'::seg as "pH";
pH
------------
6.25 .. 6.50
(1 row)
F.39.2. Syntax #
区间的外在表现形式使用连接了范围操作符 (.._或 _…) 的一个或两个浮点数形成。或者,它可以指定为中心点加上或减去一个偏差。可选确定性指标 (<、>_或 _~) 也可以存储。但是,所有内建操作符都会忽略确定性指标。 Table F.28概述了允许的表现形式; Table F.29显示了一些示例。
The external representation of an interval is formed using one or two floating-point numbers joined by the range operator (.. or …). Alternatively, it can be specified as a center point plus or minus a deviation. Optional certainty indicators (<, > or ~) can be stored as well. (Certainty indicators are ignored by all the built-in operators, however.) Table F.28 gives an overview of allowed representations; Table F.29 shows some examples.
在 Table F.28中,x、y_和 _delta_表示浮点数。_x_和 _y,但不是 delta,可以在前面加上一个确定性指标。
In Table F.28, x, y, and delta denote floating-point numbers. x and y, but not delta, can be preceded by a certainty indicator.
Table F.28. seg External Representations
Table F.28. seg External Representations
x |
Single value (zero-length interval) |
x_ .. _y |
Interval from x to y |
x_ (+-) _delta |
Interval from x - delta to x + delta |
_x .._ |
Open interval with lower bound x |
.. _x_ |
Open interval with upper bound x |
Table F.29. Examples of Valid seg Input
Table F.29. Examples of Valid seg Input
5.0 |
Creates a zero-length segment (a point, if you will) |
~5.0 |
Creates a zero-length segment and records ~ in the data. ~ is ignored by seg operations, but is preserved as a comment. |
<5.0 |
Creates a point at 5.0. < is ignored but is preserved as a comment. |
>5.0 |
Creates a point at 5.0. > is ignored but is preserved as a comment. |
5(+-)0.3 |
Creates an interval 4.7 .. 5.3. Note that the (+-) notation isn’t preserved. |
50 .. |
Everything that is greater than or equal to 50 |
.. 0 |
Everything that is less than or equal to 0 |
1.5e-2 .. 2E-2 |
Creates an interval 0.015 .. 0.02 |
1 … 2 |
The same as 1…2, or 1 .. 2, or 1..2 (spaces around the range operator are ignored) |
因为 … 运算符在数据源中得到了广泛的使用,因此将其允许作为 .. 运算符的备用拼写形式。不幸的是,这造成了解析模糊性:在 0…23 中,并不清楚上限是指 23 还是 0.23。这可以通过要求 seg 输入中所有数字在小数点前至少有一位数字来解决。
Because the … operator is widely used in data sources, it is allowed as an alternative spelling of the .. operator. Unfortunately, this creates a parsing ambiguity: it is not clear whether the upper bound in 0…23 is meant to be 23 or 0.23. This is resolved by requiring at least one digit before the decimal point in all numbers in seg input.
作为健全性检查,seg 拒绝下限大于上限的区间,例如 5 .. 2。
As a sanity check, seg rejects intervals with the lower bound greater than the upper, for example 5 .. 2.
F.39.3. Precision #
seg 值在内部存储为 32 位浮点数字对。这意味着,尾数超过 7 位的数字将被截断。
seg values are stored internally as pairs of 32-bit floating point numbers. This means that numbers with more than 7 significant digits will be truncated.
尾数为 7 位或更少的数字保留其原始精度。也就是说,如果你的查询返回 0.00,你可以确定尾随零并非格式化的伪影:它们反映了原始数据的精度。前导零数量不影响精度:数值 0.0067 被认为只有 2 位有效数字。
Numbers with 7 or fewer significant digits retain their original precision. That is, if your query returns 0.00, you will be sure that the trailing zeroes are not the artifacts of formatting: they reflect the precision of the original data. The number of leading zeroes does not affect precision: the value 0.0067 is considered to have just 2 significant digits.
F.39.4. Usage #
_seg_模块包括 _seg_值的 GiST 索引操作符类。GiST 操作符类支持的操作符显示在 Table F.30中。
The seg module includes a GiST index operator class for seg values. The operators supported by the GiST operator class are shown in Table F.30.
Table F.30. Seg GiST Operators
Operator Description |
seg << seg → boolean Is the first seg entirely to the left of the second? [a, b] << [c, d] is true if b < c. |
seg >> seg → boolean Is the first seg entirely to the right of the second? [a, b] >> [c, d] is true if a > d. |
seg &< seg → boolean Does the first seg not extend to the right of the second? [a, b] &< [c, d] is true if b ⇐ d. |
seg &> seg → boolean Does the first seg not extend to the left of the second? [a, b] &> [c, d] is true if a >= c. |
seg = seg → boolean Are the two _seg_s equal? |
seg && seg → boolean Do the two _seg_s overlap? |
seg @> seg → boolean Does the first seg contain the second? |
seg <@ seg → boolean Is the first seg contained in the second? |
除了上述操作符之外, Table 9.1中显示的常用比较操作符也可用于类型 seg。这些操作首先比较 (a) 和 (c),如果它们相等,则比较 (b) 和 (d)。这在大多数情况下都能产生合理良好的排序,如果您想将 ORDER BY 用于此类型,这将非常有用。
In addition to the above operators, the usual comparison operators shown in Table 9.1 are available for type seg. These operators first compare (a) to (c), and if these are equal, compare (b) to (d). That results in reasonably good sorting in most cases, which is useful if you want to use ORDER BY with this type.
F.39.5. Notes #
有关用法的示例,请参阅回归测试_sql/seg.sql_。
For examples of usage, see the regression test sql/seg.sql.
将_(+-)_转换为常规范围的机制在确定边界显着位数时并不能完全准确。例如,如果结果区间包含一个十的幂次,它会向较低边界添加一个额外的位数:
The mechanism that converts (+-) to regular ranges isn’t completely accurate in determining the number of significant digits for the boundaries. For example, it adds an extra digit to the lower boundary if the resulting interval includes a power of ten:
postgres=> select '10(+-)1'::seg as seg;
seg
---------
9.0 .. 11 -- should be: 9 .. 11
R树索引的性能在很大程度上取决于输入值的初始顺序。在_seg_列上对输入表进行排序可能非常有用;参阅 sort-segments.pl 中的脚本以获取示例。
The performance of an R-tree index can largely depend on the initial order of input values. It may be very helpful to sort the input table on the seg column; see the script sort-segments.pl for an example.
F.39.6. Credits #
原始作者:基因塞尔科夫,Jr. < link:mailto:selkovjr@mcs.anl.gov[selkovjr@mcs.anl.gov]> ,数学和计算机科学分部,阿贡国家实验室。
Original author: Gene Selkov, Jr. <link:mailto:selkovjr@mcs.anl.gov[selkovjr@mcs.anl.gov]>, Mathematics and Computer Science Division, Argonne National Laboratory.
我要感谢的主要是乔·赫勒斯坦因教授 ( https://dsf.berkeley.edu/jmh/) 阐明了 GiST ( http://gist.cs.berkeley.edu/) 的要点。我还要感谢所有过去和现在的 Postgres 开发人员,他们让我得以创造自己的世界,并在其中不受打扰地生活。我还想向阿贡实验室和美国能源部表示感谢,感谢他们多年来对我数据库研究的大力支持。
My thanks are primarily to Prof. Joe Hellerstein (https://dsf.berkeley.edu/jmh/) for elucidating the gist of the GiST (http://gist.cs.berkeley.edu/). I am also grateful to all Postgres developers, present and past, for enabling myself to create my own world and live undisturbed in it. And I would like to acknowledge my gratitude to Argonne Lab and to the U.S. Department of Energy for the years of faithful support of my database research.