Postgresql 中文操作指南
38.12. User-Defined Aggregates #
PostgreSQL 中的聚合函数根据 state values 和 state transition functions 定义。也就是说,聚合使用在处理每个连续输入行时更新的状态值来操作。要定义一个新的聚合函数,可以为状态值选择数据类型,为状态选择初始值,并选择状态转换函数。状态转换函数获取当前行的上一个状态值和聚合的输入值,并返回一个新的状态值。如果聚合的期望结果与需要保存在正在运行的状态值中的数据不同,也可以指定一个 final function。最终函数获取结束的状态值并返回所需的聚合结果。原则上,转换函数和最终函数都只是普通的函数,在聚合上下文中也可使用。(实际上,出于性能原因,创建仅在作为聚合的一部分时才能工作的专用转换函数往往很有帮助。)
Aggregate functions in PostgreSQL are defined in terms of state values and state transition functions. That is, an aggregate operates using a state value that is updated as each successive input row is processed. To define a new aggregate function, one selects a data type for the state value, an initial value for the state, and a state transition function. The state transition function takes the previous state value and the aggregate’s input value(s) for the current row, and returns a new state value. A final function can also be specified, in case the desired result of the aggregate is different from the data that needs to be kept in the running state value. The final function takes the ending state value and returns whatever is wanted as the aggregate result. In principle, the transition and final functions are just ordinary functions that could also be used outside the context of the aggregate. (In practice, it’s often helpful for performance reasons to create specialized transition functions that can only work when called as part of an aggregate.)
因此,除了聚合使用者的参数和结果数据类型外,还有一个内部状态值数据类型,该类型可能与参数和结果类型都不同。
Thus, in addition to the argument and result data types seen by a user of the aggregate, there is an internal state-value data type that might be different from both the argument and result types.
如果我们定义不使用最终函数的聚合,我们便获得了计算每行列值的运行函数的聚合。sum 是这种聚合的典型示例。sum 从零开始,并始终将当前行的值添加到其运行总计中。例如,如果我们想创建 sum 聚合以处理复杂数字的数据类型,我们只需要该数据类型的加法函数。聚合定义如下:
If we define an aggregate that does not use a final function, we have an aggregate that computes a running function of the column values from each row. sum is an example of this kind of aggregate. sum starts at zero and always adds the current row’s value to its running total. For example, if we want to make a sum aggregate to work on a data type for complex numbers, we only need the addition function for that data type. The aggregate definition would be:
CREATE AGGREGATE sum (complex)
(
sfunc = complex_add,
stype = complex,
initcond = '(0,0)'
);
我们可以这样使用它:
which we might use like this:
SELECT sum(a) FROM test_complex;
sum
-----------
(34,53.9)
(请注意,我们使用函数重载:有多个名为 sum 的聚合,但 PostgreSQL 可以找出哪种求和适用于类型 complex 的列。)
(Notice that we are relying on function overloading: there is more than one aggregate named sum, but PostgreSQL can figure out which kind of sum applies to a column of type complex.)
如果不存在非空输入值,上述 sum 定义将返回零(初始状态值)。或许我们希望在这种情况下返回 null——SQL 标准期望 sum 以这种方式工作。我们可以通过简单地省略 initcond 语句来做到这一点,以便初始状态值为 null。通常情况下,这意味着 sfunc 需要检查 null 状态值输入。但是,对于 sum 和一些其他简单的聚合(例如 max 和 min),将第一个非空输入值插入到状态变量中,然后从第二个非空输入值开始应用转换函数就足够了。如果初始状态值为 null 并且转换函数标记为“严格”(即不会因空输入而被调用),PostgreSQL 将自动执行此操作。
The above definition of sum will return zero (the initial state value) if there are no nonnull input values. Perhaps we want to return null in that case instead — the SQL standard expects sum to behave that way. We can do this simply by omitting the initcond phrase, so that the initial state value is null. Ordinarily this would mean that the sfunc would need to check for a null state-value input. But for sum and some other simple aggregates like max and min, it is sufficient to insert the first nonnull input value into the state variable and then start applying the transition function at the second nonnull input value. PostgreSQL will do that automatically if the initial state value is null and the transition function is marked “strict” (i.e., not to be called for null inputs).
“严格”转换函数的另一个默认行为是,每当遇到空输入值时,前一个状态值将保持不变。因此,空值将被忽略。如果您需要其他空输入行为,请勿将转换函数声明为严格;而是对其进行编码以测试空输入并执行所需的操作。
Another bit of default behavior for a “strict” transition function is that the previous state value is retained unchanged whenever a null input value is encountered. Thus, null values are ignored. If you need some other behavior for null inputs, do not declare your transition function as strict; instead code it to test for null inputs and do whatever is needed.
avg(平均值)是一个更复杂的聚合示例。它需要两个运行状态值:输入的总和和输入数量的计数。最终结果是通过除以这些数量获得的。平均值通常通过使用数组作为状态值来实现。例如,avg(float8) 的内置实现如下所示:
avg (average) is a more complex example of an aggregate. It requires two pieces of running state: the sum of the inputs and the count of the number of inputs. The final result is obtained by dividing these quantities. Average is typically implemented by using an array as the state value. For example, the built-in implementation of avg(float8) looks like:
CREATE AGGREGATE avg (float8)
(
sfunc = float8_accum,
stype = float8[],
finalfunc = float8_avg,
initcond = '{0,0,0}'
);
Note
float8_accum 需要一个三元素数组,而不仅仅是两个元素,因为它累积了输入的平方和以及输入的总和和计数。这是为了让它能够用于 avg 以及其他一些聚合。
float8_accum requires a three-element array, not just two elements, because it accumulates the sum of squares as well as the sum and count of the inputs. This is so that it can be used for some other aggregates as well as avg.
SQL 中的聚合函数调用允许 DISTINCT 和 ORDER BY 选项,这些选项控制哪些行传送给聚合的转换函数以及按什么顺序传送。这些选项在后台实现,且不受聚合支持函数的约束。
Aggregate function calls in SQL allow DISTINCT and ORDER BY options that control which rows are fed to the aggregate’s transition function and in what order. These options are implemented behind the scenes and are not the concern of the aggregate’s support functions.
有关更多详细信息,请参阅 CREATE AGGREGATE 命令。
For further details see the CREATE AGGREGATE command.
38.12.1. Moving-Aggregate Mode #
聚合函数可以有选择地支持 moving-aggregate mode,这允许在移动帧起点处在窗口中极大地加快聚合函数的执行速度。(有关使用聚合函数作为窗口函数的信息,请参见 Section 3.5和 Section 4.2.8)。基本思想是,除了普通的“前向”转换函数之外,聚合函数还提供一个 inverse transition function,当行退出窗口帧时,它允许从聚合函数的运行状态值中删除行。例如,一个用加法作前向转换函数的 _sum_聚合函数,将减法用作逆转换函数。如果没有逆转换函数,则窗口函数机制在每次帧起点移动时必须重新计算聚合函数,导致运行时间与输入行数乘以平均帧长度成正比。使用逆转换函数时,运行时间仅与输入行数成正比。
Aggregate functions can optionally support moving-aggregate mode, which allows substantially faster execution of aggregate functions within windows with moving frame starting points. (See Section 3.5 and Section 4.2.8 for information about use of aggregate functions as window functions.) The basic idea is that in addition to a normal “forward” transition function, the aggregate provides an inverse transition function, which allows rows to be removed from the aggregate’s running state value when they exit the window frame. For example a sum aggregate, which uses addition as the forward transition function, would use subtraction as the inverse transition function. Without an inverse transition function, the window function mechanism must recalculate the aggregate from scratch each time the frame starting point moves, resulting in run time proportional to the number of input rows times the average frame length. With an inverse transition function, the run time is only proportional to the number of input rows.
反向转换函数传递给当前状态值以及当前状态中包括的最早行的聚合输入值。它必须重建给定输入行从未被聚合,而只有其之后行时状态值会是什么。有时,这需要向前转换函数保留比纯聚合模式所需的更多状态。因此,移动聚合模式使用与纯模式完全不同的实现:如有需要,它有自己的状态数据类型、向前转换函数和最终函数。如果不需要额外的状态,这些数据类型和函数可以与纯模式相同。
The inverse transition function is passed the current state value and the aggregate input value(s) for the earliest row included in the current state. It must reconstruct what the state value would have been if the given input row had never been aggregated, but only the rows following it. This sometimes requires that the forward transition function keep more state than is needed for plain aggregation mode. Therefore, the moving-aggregate mode uses a completely separate implementation from the plain mode: it has its own state data type, its own forward transition function, and its own final function if needed. These can be the same as the plain mode’s data type and functions, if there is no need for extra state.
例如,我们可以将上面给出的 sum 聚合扩展为支持移动聚合模式,方法如下:
As an example, we could extend the sum aggregate given above to support moving-aggregate mode like this:
CREATE AGGREGATE sum (complex)
(
sfunc = complex_add,
stype = complex,
initcond = '(0,0)',
msfunc = complex_add,
minvfunc = complex_sub,
mstype = complex,
minitcond = '(0,0)'
);
名称以 m 开头的参数定义了移动聚合实现。除了反向转换函数 minvfunc 外,它们与没有 m 的纯聚合参数相对应。
The parameters whose names begin with m define the moving-aggregate implementation. Except for the inverse transition function minvfunc, they correspond to the plain-aggregate parameters without m.
不允许移动聚合模式的向前转换函数将 null 作为新的状态值返回。如果反向转换函数返回 null,则将其视为反向函数无法为该特定输入逆转状态计算的指示,因此将为当前帧起始位置从头重新执行聚合计算。此惯例允许在某些情况下使用移动聚合模式,其中有一些不适合从运行状态值中逆转的罕见情况。反向转换函数可以在这些情况下“解除”,但仍然可以领先,只要它适用于大多数情况即可。例如,使用浮点数工作的聚合可能选择在 NaN(不是数字)输入必须从运行状态值中删除时解除。
The forward transition function for moving-aggregate mode is not allowed to return null as the new state value. If the inverse transition function returns null, this is taken as an indication that the inverse function cannot reverse the state calculation for this particular input, and so the aggregate calculation will be redone from scratch for the current frame starting position. This convention allows moving-aggregate mode to be used in situations where there are some infrequent cases that are impractical to reverse out of the running state value. The inverse transition function can “punt” on these cases, and yet still come out ahead so long as it can work for most cases. As an example, an aggregate working with floating-point numbers might choose to punt when a NaN (not a number) input has to be removed from the running state value.
在编写移动聚合支持函数时,重要的是确保反向转换函数可以准确地重建正确状态值。否则,根据是否使用了移动聚合模式,在结果中可能存在用户可见的差异。聚合函数的示例是,最初为其添加反向转换函数似乎很简单,但此要求无法得到满足,例如 sum 超过 float4 或 float8 输入。sum(_float8) 的天真声明可能是
When writing moving-aggregate support functions, it is important to be sure that the inverse transition function can reconstruct the correct state value exactly. Otherwise there might be user-visible differences in results depending on whether the moving-aggregate mode is used. An example of an aggregate for which adding an inverse transition function seems easy at first, yet where this requirement cannot be met is sum over float4 or float8 inputs. A naive declaration of sum(_float8)_ could be
CREATE AGGREGATE unsafe_sum (float8)
(
stype = float8,
sfunc = float8pl,
mstype = float8,
msfunc = float8pl,
minvfunc = float8mi
);
但是,此聚合给出的结果可能与没有反向转换函数时给出的结果大不相同。例如,考虑
This aggregate, however, can give wildly different results than it would have without the inverse transition function. For example, consider
SELECT
unsafe_sum(x) OVER (ORDER BY n ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)
FROM (VALUES (1, 1.0e20::float8),
(2, 1.0::float8)) AS v (n,x);
此查询返回 0 作为其第二个结果,而不是预期答案 1。原因是浮点值的精度有限:将 1 添加到 1e20 再一次产生 1e20,因此从中减去 1e20 会产生 0,而不是 1。请注意,这是一个浮点运算的限制,而不是 PostgreSQL 的限制。
This query returns 0 as its second result, rather than the expected answer of 1. The cause is the limited precision of floating-point values: adding 1 to 1e20 results in 1e20 again, and so subtracting 1e20 from that yields 0, not 1. Note that this is a limitation of floating-point arithmetic in general, not a limitation of PostgreSQL.
38.12.2. Polymorphic and Variadic Aggregates #
聚合函数可以使用多态状态转换函数或最终函数,以便同样的函数可以用来实现多个聚合。有关多态函数的说明,请参见 Section 38.2.5。更进一步,聚合函数本身可以用多态输入类型和状态类型指定,允许单个聚合定义服务于多个输入数据类型。以下是多态聚合的一个示例:
Aggregate functions can use polymorphic state transition functions or final functions, so that the same functions can be used to implement multiple aggregates. See Section 38.2.5 for an explanation of polymorphic functions. Going a step further, the aggregate function itself can be specified with polymorphic input type(s) and state type, allowing a single aggregate definition to serve for multiple input data types. Here is an example of a polymorphic aggregate:
CREATE AGGREGATE array_accum (anycompatible)
(
sfunc = array_append,
stype = anycompatiblearray,
initcond = '{}'
);
在这里,任何给定聚合调用的实际状态类型是具有实际输入类型作为元素的数组类型。聚合的行为是将所有输入连接成该类型的数组。(注意:内置聚合 array_agg 提供了类似的功能,其性能比此定义会有的性能更好。)
Here, the actual state type for any given aggregate call is the array type having the actual input type as elements. The behavior of the aggregate is to concatenate all the inputs into an array of that type. (Note: the built-in aggregate array_agg provides similar functionality, with better performance than this definition would have.)
以下是使用两个不同的实际数据类型作为参数的输出:
Here’s the output using two different actual data types as arguments:
SELECT attrelid::regclass, array_accum(attname)
FROM pg_attribute
WHERE attnum > 0 AND attrelid = 'pg_tablespace'::regclass
GROUP BY attrelid;
attrelid | array_accum
---------------+---------------------------------------
pg_tablespace | {spcname,spcowner,spcacl,spcoptions}
(1 row)
SELECT attrelid::regclass, array_accum(atttypid::regtype)
FROM pg_attribute
WHERE attnum > 0 AND attrelid = 'pg_tablespace'::regclass
GROUP BY attrelid;
attrelid | array_accum
---------------+---------------------------
pg_tablespace | {name,oid,aclitem[],text[]}
(1 row)
通常,具有多态结果类型的聚合函数具有多态状态类型,如上例所示。这是必需的,因为否则不能明智地声明最终函数:它需要具有多态结果类型,但没有多态参数类型,而 CREATE FUNCTION 会拒绝,理由是无法从调用中推断出结果类型。但有时使用多态状态类型不方便。最常见的情况是,聚合支持函数是用 C 编写的,并且状态类型应声明为 internal,因为没有 SQL 级等效项。为解决这种情况,可以将最终函数声明为获取与聚合输入参数相匹配的额外“虚拟”参数。此类虚拟参数总是作为 null 值传递,因为当调用最终函数时没有特定的值可用。它们唯一的用途是允许将多态最终函数的结果类型连接到聚合的输入类型。例如,内置聚合 array_agg 的定义等价于
Ordinarily, an aggregate function with a polymorphic result type has a polymorphic state type, as in the above example. This is necessary because otherwise the final function cannot be declared sensibly: it would need to have a polymorphic result type but no polymorphic argument type, which CREATE FUNCTION will reject on the grounds that the result type cannot be deduced from a call. But sometimes it is inconvenient to use a polymorphic state type. The most common case is where the aggregate support functions are to be written in C and the state type should be declared as internal because there is no SQL-level equivalent for it. To address this case, it is possible to declare the final function as taking extra “dummy” arguments that match the input arguments of the aggregate. Such dummy arguments are always passed as null values since no specific value is available when the final function is called. Their only use is to allow a polymorphic final function’s result type to be connected to the aggregate’s input type(s). For example, the definition of the built-in aggregate array_agg is equivalent to
CREATE FUNCTION array_agg_transfn(internal, anynonarray)
RETURNS internal ...;
CREATE FUNCTION array_agg_finalfn(internal, anynonarray)
RETURNS anyarray ...;
CREATE AGGREGATE array_agg (anynonarray)
(
sfunc = array_agg_transfn,
stype = internal,
finalfunc = array_agg_finalfn,
finalfunc_extra
);
这里,finalfunc_extra 选项指定最终函数在状态值之外接收与聚合的输入参数相对应的额外虚拟参数。额外的 anynonarray 参数允许 array_agg_finalfn 的声明有效。
Here, the finalfunc_extra option specifies that the final function receives, in addition to the state value, extra dummy argument(s) corresponding to the aggregate’s input argument(s). The extra anynonarray argument allows the declaration of array_agg_finalfn to be valid.
聚合函数可以通过将它的最后一个参数声明为 VARIADIC_数组来接受可变数量的参数,这与普通函数非常相似;请参见 Section 38.5.6。聚合函数的转换函数必须与其最后一个参数是同类型的数组。转换函数通常也将被标记为 _VARIADIC,但这不是严格要求的。
An aggregate function can be made to accept a varying number of arguments by declaring its last argument as a VARIADIC array, in much the same fashion as for regular functions; see Section 38.5.6. The aggregate’s transition function(s) must have the same array type as their last argument. The transition function(s) typically would also be marked VARIADIC, but this is not strictly required.
Note
可变聚合通常与_ORDER BY_选项一起错误使用(见 Section 4.2.7),因为在这种情况下,解析器无法判断是否给定了错误数量的实际参数。请记住,_ORDER BY_右侧的所有内容都是排序键,而不是聚合参数。例如,在
Variadic aggregates are easily misused in connection with the ORDER BY option (see Section 4.2.7), since the parser cannot tell whether the wrong number of actual arguments have been given in such a combination. Keep in mind that everything to the right of ORDER BY is a sort key, not an argument to the aggregate. For example, in
SELECT myaggregate(a ORDER BY a, b, c) FROM ...
解析器会将其视为单个聚合函数参数和三个排序键。但是,用户可能意在
the parser will see this as a single aggregate function argument and three sort keys. However, the user might have intended
SELECT myaggregate(a, b, c ORDER BY a) FROM ...
如果 myaggregate 是可变参数的,这两个调用都可能是完全有效的。
If myaggregate is variadic, both these calls could be perfectly valid.
出于同样的原因,在创建具有相同名称和不同数量的常规参数的聚合函数之前,最好三思而后行。
For the same reason, it’s wise to think twice before creating aggregate functions with the same names and different numbers of regular arguments.
38.12.3. Ordered-Set Aggregates #
到目前为止我们一直描述的聚合是“普通”聚合。PostgreSQL 还支持 ordered-set aggregates,其在两个关键点上不同于普通聚合。首先,除了针对每个输入行评估一次的普通聚合参数外,有序集合聚合还可以有“直接”参数,而这些参数仅针对每个聚合操作评估一次。其次,普通聚合参数的语法明确指定了它们的排序顺序。有序集合聚合通常用于实现依赖于特定行排序的计算,例如排名或百分位数,因此排序顺序是任何调用的必需方面。例如,内置定义的 percentile_disc 等价于:
The aggregates we have been describing so far are “normal” aggregates. PostgreSQL also supports ordered-set aggregates, which differ from normal aggregates in two key ways. First, in addition to ordinary aggregated arguments that are evaluated once per input row, an ordered-set aggregate can have “direct” arguments that are evaluated only once per aggregation operation. Second, the syntax for the ordinary aggregated arguments specifies a sort ordering for them explicitly. An ordered-set aggregate is usually used to implement a computation that depends on a specific row ordering, for instance rank or percentile, so that the sort ordering is a required aspect of any call. For example, the built-in definition of percentile_disc is equivalent to:
CREATE FUNCTION ordered_set_transition(internal, anyelement)
RETURNS internal ...;
CREATE FUNCTION percentile_disc_final(internal, float8, anyelement)
RETURNS anyelement ...;
CREATE AGGREGATE percentile_disc (float8 ORDER BY anyelement)
(
sfunc = ordered_set_transition,
stype = internal,
finalfunc = percentile_disc_final,
finalfunc_extra
);
此聚合接受 float8 直接参数(百分位数)和可以是任何可排序数据类型的聚合输入。它可用于获取此类平均家庭收入:
This aggregate takes a float8 direct argument (the percentile fraction) and an aggregated input that can be of any sortable data type. It could be used to obtain a median household income like this:
SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY income) FROM households;
percentile_disc
-----------------
50489
这里,0.5 是直接参数;百分位数作为跨行变化的值是没有意义的。
Here, 0.5 is a direct argument; it would make no sense for the percentile fraction to be a value varying across rows.
与普通聚合不同,对有序集聚合的输入行的排序 not 是在后台进行的,但这是聚合支持函数的责任。典型的实现方法是在聚合的状态值中保留对“元组排序”对象的引用,将传入行送入该对象,然后完成排序并在最终函数中读出数据。这种设计允许最终函数执行特殊操作,例如将额外的“假设”行注入到要排序的数据中。虽然通常可以使用用 PL/pgSQL 或其他 PL 语言编写的支持函数实现普通聚合,但是通常必须用 C 来编写有序集聚合,因为它们的状态值不能定义为任何 SQL 数据类型。(在上面的示例中,请注意状态值声明为类型 internal ——这是典型的状态值。)此外,由于最终函数执行排序,因此不能通过再次执行转换函数来继续添加输入行。这意味着最终函数 不是 READ_ONLY ;它必须在 CREATE AGGREGATE 中声明为 READ_WRITE ,或者如果额外的最终函数调用可能使用已经排序的状态,则声明为 SHAREABLE 。
Unlike the case for normal aggregates, the sorting of input rows for an ordered-set aggregate is not done behind the scenes, but is the responsibility of the aggregate’s support functions. The typical implementation approach is to keep a reference to a “tuplesort” object in the aggregate’s state value, feed the incoming rows into that object, and then complete the sorting and read out the data in the final function. This design allows the final function to perform special operations such as injecting additional “hypothetical” rows into the data to be sorted. While normal aggregates can often be implemented with support functions written in PL/pgSQL or another PL language, ordered-set aggregates generally have to be written in C, since their state values aren’t definable as any SQL data type. (In the above example, notice that the state value is declared as type internal — this is typical.) Also, because the final function performs the sort, it is not possible to continue adding input rows by executing the transition function again later. This means the final function is not READ_ONLY; it must be declared in CREATE AGGREGATE as READ_WRITE, or as SHAREABLE if it’s possible for additional final-function calls to make use of the already-sorted state.
有序集合聚合的状态转换函数接收当前状态值,再加上每行的聚合输入值,并返回更新的状态值。这与普通聚合的定义相同,但请注意,不提供直接参数(如果有)。最终函数接收最后一个状态值、直接参数的值(如果有)(如果指定 finalfunc_extra),则接收与聚合输入相对应的空值。与普通聚合一样,只有聚合是多态的时,finalfunc_extra 才有真正的用处;那么需要额外的虚拟参数将最终函数的结果类型连接到聚合的输入类型。
The state transition function for an ordered-set aggregate receives the current state value plus the aggregated input values for each row, and returns the updated state value. This is the same definition as for normal aggregates, but note that the direct arguments (if any) are not provided. The final function receives the last state value, the values of the direct arguments if any, and (if finalfunc_extra is specified) null values corresponding to the aggregated input(s). As with normal aggregates, finalfunc_extra is only really useful if the aggregate is polymorphic; then the extra dummy argument(s) are needed to connect the final function’s result type to the aggregate’s input type(s).
目前,有序集合聚合不能用作窗口函数,因此它们不需要支持移动聚合模式。
Currently, ordered-set aggregates cannot be used as window functions, and therefore there is no need for them to support moving-aggregate mode.
38.12.4. Partial Aggregation #
或者,聚合函数可以支持 partial aggregation。部分聚合的思想是独立地对输入数据的不同子集运行聚合的状态转换函数,然后将这些子集产生的状态值合并起来以生成与扫描所有输入在单个操作中产生的相同的状态值。此模式可用于并行聚合,让不同的工作进程扫描表格的不同部分。每个工作进程产生一个部分状态值,并且在最后这些状态值被合并以产生一个最终状态值。(将来,此模式也可能用于将本地和远程表格上的聚合结合起来等目的;但这尚未实现。)
Optionally, an aggregate function can support partial aggregation. The idea of partial aggregation is to run the aggregate’s state transition function over different subsets of the input data independently, and then to combine the state values resulting from those subsets to produce the same state value that would have resulted from scanning all the input in a single operation. This mode can be used for parallel aggregation by having different worker processes scan different portions of a table. Each worker produces a partial state value, and at the end those state values are combined to produce a final state value. (In the future this mode might also be used for purposes such as combining aggregations over local and remote tables; but that is not implemented yet.)
为了支持部分聚合,聚合定义必须提供一个 combine function,它取聚合的状态类型的两个值(表示聚合输入行的两个子集的结果),并产生状态类型的新值,表示聚合两个行集组合后的状态。两个集合的输入行的相对顺序未指定。这意味着通常不可能为对输入行顺序敏感的聚合定义一个有用的合并函数。
To support partial aggregation, the aggregate definition must provide a combine function, which takes two values of the aggregate’s state type (representing the results of aggregating over two subsets of the input rows) and produces a new value of the state type, representing what the state would have been after aggregating over the combination of those sets of rows. It is unspecified what the relative order of the input rows from the two sets would have been. This means that it’s usually impossible to define a useful combine function for aggregates that are sensitive to input row order.
作为简单的示例,可以通过指定 combine 函数与用作转换函数的相同的两个较大或两个较小的比较函数,使 MAX 和 MIN 聚合支持部分聚合。SUM 聚合只需要一个加法函数作为 combine 函数。(同样,这与它们的转换函数相同,除非状态值比输入数据类型更宽。)
As simple examples, MAX and MIN aggregates can be made to support partial aggregation by specifying the combine function as the same greater-of-two or lesser-of-two comparison function that is used as their transition function. SUM aggregates just need an addition function as combine function. (Again, this is the same as their transition function, unless the state value is wider than the input data type.)
combine 函数的处理方式非常类似于转换函数,恰好使用状态类型的作为其第二个参数,而不是基础输入类型。具体而言,处理空值和严格函数的规则类似。此外,如果聚合定义指定非空 initcond,请记住不仅将其用作每个部分聚合运行的初始状态,还用作 combine 函数的初始状态,该函数将被调用以将每个部分结果组合到该状态中。
The combine function is treated much like a transition function that happens to take a value of the state type, not of the underlying input type, as its second argument. In particular, the rules for dealing with null values and strict functions are similar. Also, if the aggregate definition specifies a non-null initcond, keep in mind that that will be used not only as the initial state for each partial aggregation run, but also as the initial state for the combine function, which will be called to combine each partial result into that state.
如果聚合的状态类型被声明为 internal,则 combine 函数有责任将其结果分配到聚合状态值的正确内存上下文中。这意味着,特别是当第一个输入是 NULL 时,简单地返回第二个输入是无效的,因为该值将在错误的上下文中,并且没有足够的生命周期。
If the aggregate’s state type is declared as internal, it is the combine function’s responsibility that its result is allocated in the correct memory context for aggregate state values. This means in particular that when the first input is NULL it’s invalid to simply return the second input, as that value will be in the wrong context and will not have sufficient lifespan.
当聚合的状态类型声明为_internal_时,通常也适合聚合定义提供一个_serialization function_和一个_deserialization function_,这样状态值就可以从一个进程复制到另一个进程。如果没有这些函数,不能执行并行聚合,而本地/远程聚合等未来应用程序也可能无法工作。
When the aggregate’s state type is declared as internal, it is usually also appropriate for the aggregate definition to provide a serialization function and a deserialization function, which allow such a state value to be copied from one process to another. Without these functions, parallel aggregation cannot be performed, and future applications such as local/remote aggregation will probably not work either.
序列化函数必须接受一个类型为_internal_的单个参数,并返回一个类型为_bytea_的结果,这个结果表示打包在扁平字节块中的状态值。相反,反序列化函数逆转这个转换。它必须接受两个类型为_bytea_和_internal_的参数,并返回一个类型为_internal_的结果。(第二个参数未使用,总是为零,但出于类型安全原因需要它。)反序列化函数的结果应该简单地分配在当前内存上下文中,因为它与组合函数的结果不同,不是持久的。
A serialization function must take a single argument of type internal and return a result of type bytea, which represents the state value packaged up into a flat blob of bytes. Conversely, a deserialization function reverses that conversion. It must take two arguments of types bytea and internal, and return a result of type internal. (The second argument is unused and is always zero, but it is required for type-safety reasons.) The result of the deserialization function should simply be allocated in the current memory context, as unlike the combine function’s result, it is not long-lived.
还需要注意的是,要并行执行聚合,聚合本身必须标记为_PARALLEL SAFE_。不会查询其支持函数上的并行安全标记。
Worth noting also is that for an aggregate to be executed in parallel, the aggregate itself must be marked PARALLEL SAFE. The parallel-safety markings on its support functions are not consulted.
38.12.5. Support Functions for Aggregates #
以 C 编写的函数可以通过调用_AggCheckCallContext_来检测它是否被调用为聚合支持函数,例如:
A function written in C can detect that it is being called as an aggregate support function by calling AggCheckCallContext, for example:
if (AggCheckCallContext(fcinfo, NULL))
进行此检查的一个原因是,当为 true 时,第一个输入必须是临时状态值,因此可以安全地对它进行修改,而不是分配新副本。有关一个示例,请参阅 int8inc() 。(虽然始终允许聚合转换函数修改转换值,但通常不建议聚合最终函数这样做;如果它们这样做,则在创建聚合时必须声明该行为。有关更多详细信息,请参阅 CREATE AGGREGATE 。)
One reason for checking this is that when it is true, the first input must be a temporary state value and can therefore safely be modified in-place rather than allocating a new copy. See int8inc() for an example. (While aggregate transition functions are always allowed to modify the transition value in-place, aggregate final functions are generally discouraged from doing so; if they do so, the behavior must be declared when creating the aggregate. See CREATE AGGREGATE for more detail.)
AggCheckCallContext_的第二个参数可用于检索保留聚合状态值所在的内存环境。这对想要用“扩展”对象(请参见 Section 38.13.1)作为其状态值的状态转换函数很有用。在第一次调用时,转换函数应该返回一个其内存环境是聚合状态环境的子代的扩展对象,然后在后续的调用中继续返回同一个扩展对象。有关示例,请参见 _array_append()。(_array_append()_不是任何内置聚合的转换函数,但它被编写成在用作自定义聚合的转换函数时高效工作。)
The second argument of AggCheckCallContext can be used to retrieve the memory context in which aggregate state values are being kept. This is useful for transition functions that wish to use “expanded” objects (see Section 38.13.1) as their state values. On first call, the transition function should return an expanded object whose memory context is a child of the aggregate state context, and then keep returning the same expanded object on subsequent calls. See array_append() for an example. (array_append() is not the transition function of any built-in aggregate, but it is written to behave efficiently when used as transition function of a custom aggregate.)
聚合函数在 C 中编写的另一项可用的支持例程是_AggGetAggref_,它返回定义聚合调用的_Aggref_解析节点。这主要适用于有序集聚合,它可以检查_Aggref_节点的子结构以找出它们应该实现的排序类型。可以在 PostgreSQL 源代码的_orderedsetaggs.c_中找到示例。
Another support routine available to aggregate functions written in C is AggGetAggref, which returns the Aggref parse node that defines the aggregate call. This is mainly useful for ordered-set aggregates, which can inspect the substructure of the Aggref node to find out what sort ordering they are supposed to implement. Examples can be found in orderedsetaggs.c in the PostgreSQL source code.