Apache Spark supports processing various types of data. Not all expressions support all data types. The RAPIDS Accelerator for Apache Spark has further restrictions on what types are supported for processing. This tries to document what operations are supported and what data types each operation supports.
General limitations
Decimal
The Decimal
type in Spark supports a precision up to 38 digits (128-bits). The RAPIDS Accelerator stores values up to 64-bits and as such only supports a precision up to 18 digits. Note that decimals are disabled by default in the plugin because they are supported by a small number of operations presently, which can result in a lot of data movement to and from the GPU, slowing down processing in some cases.
Timestamp
Timestamps in Spark will all be converted to the local time zone before processing and are often converted to UTC before being stored, like in Parquet or ORC. The RAPIDS Accelerator only supports UTC as the time zone for timestamps.
CalendarInterval
In Spark CalendarInterval
s store three values, months, days, and microseconds. Support for this type is still very limited in the accelerator, and generally only days are supported.
Configuration
There are lots of different configuration values that can impact if an operation is supported or not. Some of these are a part of the RAPIDS Accelerator and cover the level of compatibility with Apache Spark. Those are covered here. Others are a part of Apache Spark itself and those are a bit harder to document. The work of updating this to cover that support is still ongoing.
In general though if you ever have any question about why an operation is not running on the GPU you may set spark.rapids.sql.explain
to ALL and it will try to give all of the reasons why this particular operator or expression is on the CPU or GPU.
Key
Types
Type Name | Type Description |
---|---|
BOOLEAN | Holds true or false values. |
BYTE | Signed 8-bit integer value. |
SHORT | Signed 16-bit integer value. |
INT | Signed 32-bit integer value. |
LONG | Signed 64-bit integer value. |
FLOAT | 32-bit floating point value. |
DOUBLE | 64-bit floating point value. |
DATE | A date with no time component. Stored as 32-bit integer with days since Jan 1, 1970. |
TIMESTAMP | A date and time. Stored as 64-bit integer with microseconds since Jan 1, 1970 in the current time zone. |
STRING | A text string. Stored as UTF-8 encoded bytes. |
DECIMAL | A fixed point decimal value with configurable precision and scale. |
NULL | Only stores null values and is typically only used when no other type can be determined from the SQL. |
BINARY | An array of non-nullable bytes. |
CALENDAR | Represents a period of time. Stored as months, days and microseconds. |
ARRAY | A sequence of elements. |
MAP | A set of key value pairs, the keys cannot be null. |
STRUCT | A series of named fields. |
UDT | User defined types and java Objects. These are not standard SQL types. |
Support
Value | Description |
---|---|
S | (Supported) Both Apache Spark and the RAPIDS Accelerator support this type. |
S* | (Supported with limitations) Typically this refers to general limitations with Timestamp or Decimal |
(Not Applicable) Neither Spark not the RAPIDS Accelerator support this type in this situation. | |
PS | (Partial Support) Apache Spark supports this type, but the RAPIDS Accelerator only partially supports it. An explanation for what is missing will be included with this. |
PS* | (Partial Support with limitations) Like regular Partial Support but with general limitations on Timestamp or Decimal types. |
NS | (Not Supported) Apache Spark supports this type but the RAPIDS Accelerator does not. |
SparkPlan or Executor Nodes
Apache Spark uses a Directed Acyclic Graph(DAG) of processing to build a query. The nodes in this graph are instances of SparkPlan
and represent various high level operations like doing a filter or project. The operations that the RAPIDS Accelerator supports are described below.
Executor | Description | Notes | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CoalesceExec | The backend for the dataframe coalesce method | None | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS |
CollectLimitExec | Reduce to single partition and apply limit | This is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU | S | S | S | S | S | S | S | S | S* | S | NS | NS | NS | NS | NS | NS | NS | NS |
ExpandExec | The backend for the expand operator | None | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS |
FileSourceScanExec | Reading data from files, often from Hive tables | None | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | PS* (missing nested DECIMAL, BINARY, CALENDAR, UDT) | PS* (missing nested DECIMAL, BINARY, CALENDAR, UDT) | PS* (missing nested DECIMAL, BINARY, CALENDAR, UDT) | NS |
FilterExec | The backend for most filter statements | None | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | PS* (missing nested BINARY, CALENDAR, UDT) | PS* (missing nested BINARY, CALENDAR, UDT) | PS* (missing nested BINARY, CALENDAR, UDT) | NS |
GenerateExec | The backend for operations that generate more output rows than input rows like explode | None | S | S | S | S | S | S | S | S | S* | S | NS | NS | NS | NS | PS* (Only literal arrays and the output of the array function are supported; missing nested DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) | NS | NS | NS |
GlobalLimitExec | Limiting of results across partitions | None | S | S | S | S | S | S | S | S | S* | S | NS | NS | NS | NS | NS | NS | NS | NS |
LocalLimitExec | Per-partition limiting of results | None | S | S | S | S | S | S | S | S | S* | S | NS | NS | NS | NS | NS | NS | NS | NS |
ProjectExec | The backend for most select, withColumn and dropColumn statements | None | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | PS* (missing nested BINARY, CALENDAR, UDT) | PS* (missing nested BINARY, CALENDAR, UDT) | PS* (missing nested BINARY, CALENDAR, UDT) | NS |
RangeExec | The backend for range operator | None | S | |||||||||||||||||
SortExec | The backend for the sort operator | None | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS |
UnionExec | The backend for the union operator | None | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS |
CustomShuffleReaderExec | A wrapper of shuffle query stage | None | S | S | S | S | S | S | S | S | S* | S | NS | NS | NS | NS | NS | NS | NS | NS |
HashAggregateExec | The backend for hash based aggregations | None | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | PS (missing nested BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) | NS | NS |
SortAggregateExec | The backend for sort based aggregations | None | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | PS (missing nested BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) | NS | NS |
DataWritingCommandExec | Writing data | None | S | S | S | S | S | S | S | S | S* | S | NS | NS | NS | NS | NS | NS | NS | NS |
BatchScanExec | The backend for most file input | None | S | S | S | S | S | S | S | S | S* | S | NS | NS | NS | NS | PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, UDT) | PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, UDT) | PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, UDT) | NS |
BroadcastExchangeExec | The backend for broadcast exchange of data | None | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS |
ShuffleExchangeExec | The backend for most data being exchanged between processes | None | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS |
BroadcastHashJoinExec | Implementation of join using broadcast data | None | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS |
BroadcastNestedLoopJoinExec | Implementation of join using brute force | This is disabled by default because large joins can cause out of memory errors | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS |
CartesianProductExec | Implementation of join using brute force | This is disabled by default because large joins can cause out of memory errors | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS |
ShuffledHashJoinExec | Implementation of join using hashed shuffled data | None | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS |
SortMergeJoinExec | Sort merge join, replacing with shuffled hash join | None | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS |
ArrowEvalPythonExec | The backend of the Scalar Pandas UDFs. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled | None | S | S | S | S | S | S | S | S | S* | S | NS | NS | NS | NS | NS | NS | NS | NS |
WindowInPandasExec | The backend for Window Aggregation Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. For now it only supports row based window frame. | This is disabled by default because it only supports row based frame for now | S | S | S | S | S | S | S | S | S* | S | NS | NS | NS | NS | PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) | NS | NS | NS |
WindowExec | Window-operator backend | None | S | S | S | S | S | S | S | S | S* | S | NS | NS | NS | NS | NS | NS | NS | NS |
- as was stated previously Decimal is only supported up to a precision of 18 and Timestamp is only supported in the UTC time zone. Decimals are off by default due to performance impact in some cases.
Expression and SQL Functions
Inside each node in the DAG there can be one or more trees of expressions that describe various types of processing that happens in that part of the plan. These can be things like adding two numbers together or checking for null. These expressions can have multiple input parameters and one output value. These expressions also can happen in different contexts. Because of how the accelerator works different contexts have different levels of support.
The most common expression context is project
. In this context values from a single input row go through the expression and the result will also be use to produce something in the same row. Be aware that even in the case of aggregation and window operations most of the processing is still done in the project context either before or after the other processing happens.
Aggregation operations like count or sum can take place in either the aggregation
, reduction
, or window
context. aggregation
is when the operation was done while grouping the data by one or more keys. reduction
is when there is no group by and there is a single result for an entire column. window
is for window operations.
The final expression context is lambda
which happens primarily for higher order functions in SQL. Accelerator support is described below.
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Abs | `abs` | Absolute value | None | project | input | S | S | S | S | S | S | NS | |||||||||||
result | S | S | S | S | S | S | NS | ||||||||||||||||
lambda | input | NS | NS | NS | NS | NS | NS | NS | |||||||||||||||
result | NS | NS | NS | NS | NS | NS | NS | ||||||||||||||||
Acos | `acos` | Inverse cosine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Acosh | `acosh` | Inverse hyperbolic cosine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Add | `+` | Addition | None | project | lhs | S | S | S | S | S | S | NS | NS | ||||||||||
rhs | S | S | S | S | S | S | NS | NS | |||||||||||||||
result | S | S | S | S | S | S | NS | NS | |||||||||||||||
lambda | lhs | NS | NS | NS | NS | NS | NS | NS | NS | ||||||||||||||
rhs | NS | NS | NS | NS | NS | NS | NS | NS | |||||||||||||||
result | NS | NS | NS | NS | NS | NS | NS | NS | |||||||||||||||
Alias | Gives a column a name | None | project | input | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | PS* (missing nested BINARY, CALENDAR, UDT) | PS* (missing nested BINARY, CALENDAR, UDT) | PS* (missing nested BINARY, CALENDAR, UDT) | NS | |
result | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | PS* (missing nested BINARY, CALENDAR, UDT) | PS* (missing nested BINARY, CALENDAR, UDT) | PS* (missing nested BINARY, CALENDAR, UDT) | NS | |||||
lambda | input | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
And | `and` | Logical AND | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | lhs | NS | |||||||||||||||||||||
rhs | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
AnsiCast | Convert a column of one type of data into another type | None | project | input | S | S | S | S | S | S | S | S | S* | S | NS | S | S | NS | NS | NS | NS | NS | |
result | S | S | S | S | S | S | S | S | S* | S | NS | S | S | NS | NS | NS | NS | NS | |||||
lambda | input | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
Asin | `asin` | Inverse sine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Asinh | `asinh` | Inverse hyperbolic sine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
AtLeastNNonNulls | Checks if number of non null/Nan values is greater than a given value | None | project | input | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | PS* (missing nested DECIMAL, BINARY, CALENDAR, UDT) | PS* (missing nested DECIMAL, BINARY, CALENDAR, UDT) | PS* (missing nested DECIMAL, BINARY, CALENDAR, UDT) | NS | |
result | S | ||||||||||||||||||||||
lambda | input | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
result | NS | ||||||||||||||||||||||
Atan | `atan` | Inverse tangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Atanh | `atanh` | Inverse hyperbolic tangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
AttributeReference | References an input column | None | project | result | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | PS* (missing nested BINARY, CALENDAR, UDT) | PS* (missing nested BINARY, CALENDAR, UDT) | PS* (missing nested BINARY, CALENDAR, UDT) | NS | |
lambda | result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
BitwiseAnd | `&` | Returns the bitwise AND of the operands | None | project | lhs | S | S | S | S | ||||||||||||||
rhs | S | S | S | S | |||||||||||||||||||
result | S | S | S | S | |||||||||||||||||||
lambda | lhs | NS | NS | NS | NS | ||||||||||||||||||
rhs | NS | NS | NS | NS | |||||||||||||||||||
result | NS | NS | NS | NS | |||||||||||||||||||
BitwiseNot | `~` | Returns the bitwise NOT of the operands | None | project | input | S | S | S | S | ||||||||||||||
result | S | S | S | S | |||||||||||||||||||
lambda | input | NS | NS | NS | NS | ||||||||||||||||||
result | NS | NS | NS | NS | |||||||||||||||||||
BitwiseOr | `\|` | Returns the bitwise OR of the operands | None | project | lhs | S | S | S | S | ||||||||||||||
rhs | S | S | S | S | |||||||||||||||||||
result | S | S | S | S | |||||||||||||||||||
lambda | lhs | NS | NS | NS | NS | ||||||||||||||||||
rhs | NS | NS | NS | NS | |||||||||||||||||||
result | NS | NS | NS | NS | |||||||||||||||||||
BitwiseXor | `^` | Returns the bitwise XOR of the operands | None | project | lhs | S | S | S | S | ||||||||||||||
rhs | S | S | S | S | |||||||||||||||||||
result | S | S | S | S | |||||||||||||||||||
lambda | lhs | NS | NS | NS | NS | ||||||||||||||||||
rhs | NS | NS | NS | NS | |||||||||||||||||||
result | NS | NS | NS | NS | |||||||||||||||||||
CaseWhen | `when` | CASE WHEN expression | None | project | predicate | S | |||||||||||||||||
value | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS | |||||
lambda | predicate | NS | |||||||||||||||||||||
value | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
Cast | `timestamp`, `tinyint`, `binary`, `float`, `smallint`, `string`, `decimal`, `double`, `boolean`, `cast`, `date`, `int`, `bigint` | Convert a column of one type of data into another type | None | project | input | S | S | S | S | S | S | S | S | S* | S | NS | S | S | NS | NS | NS | NS | NS |
result | S | S | S | S | S | S | S | S | S* | S | NS | S | S | NS | NS | NS | NS | NS | |||||
lambda | input | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
Cbrt | `cbrt` | Cube root | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Ceil | `ceiling`, `ceil` | Ceiling of a number | None | project | input | S | S | NS | |||||||||||||||
result | S | S | NS | ||||||||||||||||||||
lambda | input | NS | NS | NS | |||||||||||||||||||
result | NS | NS | NS | ||||||||||||||||||||
Coalesce | `coalesce` | Returns the first non-null argument if exists. Otherwise, null | None | project | param | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS |
result | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS | |||||
lambda | param | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
Concat | `concat` | String concatenate NO separator | None | project | input | S | NS | NS | |||||||||||||||
result | S | NS | NS | ||||||||||||||||||||
lambda | input | NS | NS | NS | |||||||||||||||||||
result | NS | NS | NS | ||||||||||||||||||||
Contains | Contains | None | project | src | S | ||||||||||||||||||
search | PS (Literal value only) | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | src | NS | |||||||||||||||||||||
search | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Cos | `cos` | Cosine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Cosh | `cosh` | Hyperbolic cosine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Cot | `cot` | Cotangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
DateAdd | `date_add` | Returns the date that is num_days after start_date | None | project | startDate | S | |||||||||||||||||
days | S | S | S | ||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | startDate | NS | |||||||||||||||||||||
days | NS | NS | NS | ||||||||||||||||||||
result | NS | ||||||||||||||||||||||
DateDiff | `datediff` | Returns the number of days from startDate to endDate | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | lhs | NS | |||||||||||||||||||||
rhs | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
DateSub | `date_sub` | Returns the date that is num_days before start_date | None | project | startDate | S | |||||||||||||||||
days | S | S | S | ||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | startDate | NS | |||||||||||||||||||||
days | NS | NS | NS | ||||||||||||||||||||
result | NS | ||||||||||||||||||||||
DayOfMonth | `dayofmonth`, `day` | Returns the day of the month from a date or timestamp | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
DayOfWeek | `dayofweek` | Returns the day of the week (1 = Sunday...7=Saturday) | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
DayOfYear | `dayofyear` | Returns the day of the year from a date or timestamp | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Divide | `/` | Division | None | project | lhs | S | NS | ||||||||||||||||
rhs | S | NS | |||||||||||||||||||||
result | S | NS | |||||||||||||||||||||
lambda | lhs | NS | NS | ||||||||||||||||||||
rhs | NS | NS | |||||||||||||||||||||
result | NS | NS | |||||||||||||||||||||
EndsWith | Ends with | None | project | src | S | ||||||||||||||||||
search | PS (Literal value only) | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | src | NS | |||||||||||||||||||||
search | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
EqualNullSafe | `<=>` | Check if the values are equal including nulls <=> | None | project | lhs | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS |
rhs | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS | |||||
lambda | lhs | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
rhs | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
EqualTo | `=`, `==` | Check if the values are equal | None | project | lhs | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | NS | NS | NS | NS |
rhs | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | NS | NS | NS | NS | |||||
lambda | lhs | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
rhs | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
Exp | `exp` | Euler's number e raised to a power | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Expm1 | `expm1` | Euler's number e raised to a power minus 1 | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Floor | `floor` | Floor of a number | None | project | input | S | S | NS | |||||||||||||||
result | S | S | NS | ||||||||||||||||||||
lambda | input | NS | NS | NS | |||||||||||||||||||
result | NS | NS | NS | ||||||||||||||||||||
FromUnixTime | `from_unixtime` | Get the string from a unix timestamp | None | project | sec | S | |||||||||||||||||
format | PS (Only a limited number of formats are supported; Literal value only) | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | sec | NS | |||||||||||||||||||||
format | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
GetArrayItem | Gets the field at `ordinal` in the Array | None | project | array | PS* (missing nested DECIMAL, BINARY, CALENDAR, MAP, UDT) | ||||||||||||||||||
ordinal | PS (Literal value only) | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | PS* (missing nested DECIMAL, BINARY, CALENDAR, MAP, UDT) | NS | PS* (missing nested DECIMAL, BINARY, CALENDAR, MAP, UDT) | NS | |||||
lambda | array | NS | |||||||||||||||||||||
ordinal | NS | ||||||||||||||||||||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
GetMapValue | Gets Value from a Map based on a key | None | project | map | PS (missing nested BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) | ||||||||||||||||||
key | NS | NS | NS | NS | NS | NS | NS | NS | NS | PS (Literal value only) | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | S | NS | NS | NS | NS | NS | NS | NS | NS | |||||
lambda | map | NS | |||||||||||||||||||||
key | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
GetStructField | Gets the named field of the struct | None | project | input | PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, UDT) | ||||||||||||||||||
result | S | S | S | S | S | S | S | S | S* | S | NS | NS | NS | NS | PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, UDT) | PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, UDT) | PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, UDT) | NS | |||||
lambda | input | NS | |||||||||||||||||||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
GreaterThan | `>` | > operator | None | project | lhs | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | NS | NS | NS | NS |
rhs | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | NS | NS | NS | NS | |||||
lambda | lhs | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
rhs | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
GreaterThanOrEqual | `>=` | >= operator | None | project | lhs | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | NS | NS | NS | NS |
rhs | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | NS | NS | NS | NS | |||||
lambda | lhs | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
rhs | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
Greatest | `greatest` | Returns the greatest value of all parameters, skipping null values | None | project | param | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | |
result | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | ||||||
lambda | param | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||||
Hour | `hour` | Returns the hour component of the string/timestamp | None | project | input | S* | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
If | `if` | IF expression | None | project | predicate | PS (literal values are not supported) | |||||||||||||||||
trueValue | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS | |||||
falseValue | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS | |||||
lambda | predicate | NS | |||||||||||||||||||||
trueValue | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
falseValue | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
In | `in` | IN operator | None | project | value | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS |
list | PS (Literal value only) | PS (Literal value only) | PS (Literal value only) | PS (Literal value only) | PS (Literal value only) | PS (Literal value only) | PS (Literal value only) | PS (Literal value only) | PS* (Literal value only) | PS (Literal value only) | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | S | ||||||||||||||||||||||
lambda | value | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
list | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | NS | ||||||||||||||||||||||
InSet | INSET operator | None | project | input | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS | |
result | S | ||||||||||||||||||||||
lambda | input | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
result | NS | ||||||||||||||||||||||
InitCap | `initcap` | Returns str with the first letter of each word in uppercase. All other letters are in lowercase | This is not 100% compatible with the Spark version because in some cases unicode characters change byte width when changing the case. The GPU string conversion does not support these characters. For a full list of unsupported characters see https://github.com/rapidsai/cudf/issues/3132 Spark also only sees the space character as a word deliminator, but this uses more white space characters. | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
InputFileBlockLength | `input_file_block_length` | Returns the length of the block being read, or -1 if not available | None | project | result | S | |||||||||||||||||
lambda | result | NS | |||||||||||||||||||||
InputFileBlockStart | `input_file_block_start` | Returns the start offset of the block being read, or -1 if not available | None | project | result | S | |||||||||||||||||
lambda | result | NS | |||||||||||||||||||||
InputFileName | `input_file_name` | Returns the name of the file being read, or empty string if not available | None | project | result | S | |||||||||||||||||
lambda | result | NS | |||||||||||||||||||||
IntegralDivide | `div` | Division with a integer result | None | project | lhs | S | NS | ||||||||||||||||
rhs | S | NS | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | lhs | NS | NS | ||||||||||||||||||||
rhs | NS | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
IsNaN | `isnan` | Checks if a value is NaN | None | project | input | S | S | ||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | NS | ||||||||||||||||||||
result | NS | ||||||||||||||||||||||
IsNotNull | `isnotnull` | Checks if a value is not null | None | project | input | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | PS* (missing nested BINARY, CALENDAR, UDT) | PS* (missing nested BINARY, CALENDAR, UDT) | PS* (missing nested BINARY, CALENDAR, UDT) | NS |
result | S | ||||||||||||||||||||||
lambda | input | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
result | NS | ||||||||||||||||||||||
IsNull | `isnull` | Checks if a value is null | None | project | input | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | PS* (missing nested BINARY, CALENDAR, UDT) | PS* (missing nested BINARY, CALENDAR, UDT) | PS* (missing nested BINARY, CALENDAR, UDT) | NS |
result | S | ||||||||||||||||||||||
lambda | input | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
result | NS | ||||||||||||||||||||||
KnownFloatingPointNormalized | Tag to prevent redundant normalization | None | project | input | S | S | |||||||||||||||||
result | S | S | |||||||||||||||||||||
lambda | input | NS | NS | ||||||||||||||||||||
result | NS | NS | |||||||||||||||||||||
Lag | `lag` | Window function that returns N entries behind this one | None | window | input | S | S | S | S | S | S | S | S | S* | NS | NS | NS | NS | NS | NS | NS | NS | NS |
offset | S | ||||||||||||||||||||||
default | S | S | S | S | S | S | S | S | S* | NS | NS | S | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | S* | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
LastDay | `last_day` | Returns the last day of the month which the date belongs to | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Lead | `lead` | Window function that returns N entries ahead of this one | None | window | input | S | S | S | S | S | S | S | S | S* | NS | NS | NS | NS | NS | NS | NS | NS | NS |
offset | S | ||||||||||||||||||||||
default | S | S | S | S | S | S | S | S | S* | NS | NS | S | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | S* | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
Least | `least` | Returns the least value of all parameters, skipping null values | None | project | param | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | |
result | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | ||||||
lambda | param | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||||
Length | `length`, `character_length`, `char_length` | String character length or binary byte length | None | project | input | S | NS | ||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | NS | ||||||||||||||||||||
result | NS | ||||||||||||||||||||||
LessThan | `<` | < operator | None | project | lhs | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | NS | NS | NS | NS |
rhs | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | NS | NS | NS | NS | |||||
lambda | lhs | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
rhs | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
LessThanOrEqual | `<=` | <= operator | None | project | lhs | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | NS | NS | NS | NS |
rhs | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | NS | NS | NS | NS | NS | |||||
lambda | lhs | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
rhs | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
Like | `like` | Like | None | project | src | S | |||||||||||||||||
search | PS (Literal value only) | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | src | NS | |||||||||||||||||||||
search | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Literal | Holds a static value from the query | None | project | result | S | S | S | S | S | S | S | S | S* | S | S* | S | NS | S | NS | NS | NS | NS | |
lambda | result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
Log | `ln` | Natural log | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Log10 | `log10` | Log base 10 | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Log1p | `log1p` | Natural log 1 + expr | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Log2 | `log2` | Log base 2 | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Logarithm | `log` | Log variable base | None | project | value | S | |||||||||||||||||
base | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | value | NS | |||||||||||||||||||||
base | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Lower | `lower`, `lcase` | String lowercase operator | This is not 100% compatible with the Spark version because in some cases unicode characters change byte width when changing the case. The GPU string conversion does not support these characters. For a full list of unsupported characters see https://github.com/rapidsai/cudf/issues/3132 | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Md5 | `md5` | MD5 hash operator | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Minute | `minute` | Returns the minute component of the string/timestamp | None | project | input | S* | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
MonotonicallyIncreasingID | `monotonically_increasing_id` | Returns monotonically increasing 64-bit integers | None | project | result | S | |||||||||||||||||
lambda | result | NS | |||||||||||||||||||||
Month | `month` | Returns the month from a date or timestamp | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Multiply | `*` | Multiplication | None | project | lhs | S | S | S | S | S | S | NS | |||||||||||
rhs | S | S | S | S | S | S | NS | ||||||||||||||||
result | S | S | S | S | S | S | NS | ||||||||||||||||
lambda | lhs | NS | NS | NS | NS | NS | NS | NS | |||||||||||||||
rhs | NS | NS | NS | NS | NS | NS | NS | ||||||||||||||||
result | NS | NS | NS | NS | NS | NS | NS | ||||||||||||||||
NaNvl | `nanvl` | Evaluates to `left` iff left is not NaN, `right` otherwise | None | project | lhs | S | S | ||||||||||||||||
rhs | S | S | |||||||||||||||||||||
result | S | S | |||||||||||||||||||||
lambda | lhs | NS | NS | ||||||||||||||||||||
rhs | NS | NS | |||||||||||||||||||||
result | NS | NS | |||||||||||||||||||||
Not | `!`, `not` | Boolean not operator | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Or | `or` | Logical OR | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | lhs | NS | |||||||||||||||||||||
rhs | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Pmod | `pmod` | Pmod | None | project | lhs | S | S | S | S | S | S | NS | |||||||||||
rhs | S | S | S | S | S | S | NS | ||||||||||||||||
result | S | S | S | S | S | S | NS | ||||||||||||||||
lambda | lhs | NS | NS | NS | NS | NS | NS | NS | |||||||||||||||
rhs | NS | NS | NS | NS | NS | NS | NS | ||||||||||||||||
result | NS | NS | NS | NS | NS | NS | NS | ||||||||||||||||
Pow | `pow`, `power` | lhs ^ rhs | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | lhs | NS | |||||||||||||||||||||
rhs | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Quarter | `quarter` | Returns the quarter of the year for date, in the range 1 to 4 | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Rand | `random`, `rand` | Generate a random column with i.i.d. uniformly distributed values in [0, 1) | None | project | seed | S | S | ||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | seed | NS | NS | ||||||||||||||||||||
result | NS | ||||||||||||||||||||||
RegExpReplace | `regexp_replace` | RegExpReplace support for string literal input patterns | None | project | str | S | |||||||||||||||||
regex | PS (very limited regex support; Literal value only) | ||||||||||||||||||||||
rep | PS (Literal value only) | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | str | NS | |||||||||||||||||||||
regex | NS | ||||||||||||||||||||||
rep | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Remainder | `%`, `mod` | Remainder or modulo | None | project | lhs | S | S | S | S | S | S | NS | |||||||||||
rhs | S | S | S | S | S | S | NS | ||||||||||||||||
result | S | S | S | S | S | S | NS | ||||||||||||||||
lambda | lhs | NS | NS | NS | NS | NS | NS | NS | |||||||||||||||
rhs | NS | NS | NS | NS | NS | NS | NS | ||||||||||||||||
result | NS | NS | NS | NS | NS | NS | NS | ||||||||||||||||
Rint | `rint` | Rounds up a double value to the nearest double equal to an integer | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
RowNumber | `row_number` | Window function that returns the index for the row within the aggregation window | None | window | result | S | |||||||||||||||||
Second | `second` | Returns the second component of the string/timestamp | None | project | input | S* | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
ShiftLeft | `shiftleft` | Bitwise shift left (<<) | None | project | value | S | S | ||||||||||||||||
amount | S | ||||||||||||||||||||||
result | S | S | |||||||||||||||||||||
lambda | value | NS | NS | ||||||||||||||||||||
amount | NS | ||||||||||||||||||||||
result | NS | NS | |||||||||||||||||||||
ShiftRight | `shiftright` | Bitwise shift right (>>) | None | project | value | S | S | ||||||||||||||||
amount | S | ||||||||||||||||||||||
result | S | S | |||||||||||||||||||||
lambda | value | NS | NS | ||||||||||||||||||||
amount | NS | ||||||||||||||||||||||
result | NS | NS | |||||||||||||||||||||
ShiftRightUnsigned | `shiftrightunsigned` | Bitwise unsigned shift right (>>>) | None | project | value | S | S | ||||||||||||||||
amount | S | ||||||||||||||||||||||
result | S | S | |||||||||||||||||||||
lambda | value | NS | NS | ||||||||||||||||||||
amount | NS | ||||||||||||||||||||||
result | NS | NS | |||||||||||||||||||||
Signum | `sign`, `signum` | Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Sin | `sin` | Sine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Sinh | `sinh` | Hyperbolic sine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
SparkPartitionID | `spark_partition_id` | Returns the current partition id | None | project | result | S | |||||||||||||||||
lambda | result | NS | |||||||||||||||||||||
Sqrt | `sqrt` | Square root | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
StartsWith | Starts with | None | project | src | S | ||||||||||||||||||
search | PS (Literal value only) | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | src | NS | |||||||||||||||||||||
search | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
StringLPad | `lpad` | Pad a string on the left | None | project | str | S | |||||||||||||||||
len | PS (Literal value only) | ||||||||||||||||||||||
pad | PS (Literal value only) | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | str | NS | |||||||||||||||||||||
len | NS | ||||||||||||||||||||||
pad | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
StringLocate | `position`, `locate` | Substring search operator | None | project | substr | PS (Literal value only) | |||||||||||||||||
str | S | ||||||||||||||||||||||
start | PS (Literal value only) | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | substr | NS | |||||||||||||||||||||
str | NS | ||||||||||||||||||||||
start | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
StringRPad | `rpad` | Pad a string on the right | None | project | str | S | |||||||||||||||||
len | PS (Literal value only) | ||||||||||||||||||||||
pad | PS (Literal value only) | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | str | NS | |||||||||||||||||||||
len | NS | ||||||||||||||||||||||
pad | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
StringReplace | `replace` | StringReplace operator | None | project | src | S | |||||||||||||||||
search | PS (Literal value only) | ||||||||||||||||||||||
replace | PS (Literal value only) | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | src | NS | |||||||||||||||||||||
search | NS | ||||||||||||||||||||||
replace | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
StringSplit | `split` | Splits `str` around occurrences that match `regex` | None | project | str | S | |||||||||||||||||
regexp | PS (very limited subset of regex supported; Literal value only) | ||||||||||||||||||||||
limit | PS (Literal value only) | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | str | NS | |||||||||||||||||||||
regexp | NS | ||||||||||||||||||||||
limit | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
StringTrim | `trim` | StringTrim operator | None | project | src | S | |||||||||||||||||
trimStr | PS (Literal value only) | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | src | NS | |||||||||||||||||||||
trimStr | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
StringTrimLeft | `ltrim` | StringTrimLeft operator | None | project | src | S | |||||||||||||||||
trimStr | PS (Literal value only) | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | src | NS | |||||||||||||||||||||
trimStr | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
StringTrimRight | `rtrim` | StringTrimRight operator | None | project | src | S | |||||||||||||||||
trimStr | PS (Literal value only) | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | src | NS | |||||||||||||||||||||
trimStr | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Substring | `substr`, `substring` | Substring operator | None | project | str | S | NS | ||||||||||||||||
pos | PS (Literal value only) | ||||||||||||||||||||||
len | PS (Literal value only) | ||||||||||||||||||||||
result | S | NS | |||||||||||||||||||||
lambda | str | NS | NS | ||||||||||||||||||||
pos | NS | ||||||||||||||||||||||
len | NS | ||||||||||||||||||||||
result | NS | NS | |||||||||||||||||||||
SubstringIndex | `substring_index` | substring_index operator | None | project | str | S | |||||||||||||||||
delim | PS (only a single character is allowed; Literal value only) | ||||||||||||||||||||||
count | PS (Literal value only) | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | str | NS | |||||||||||||||||||||
delim | NS | ||||||||||||||||||||||
count | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Subtract | `-` | Subtraction | None | project | lhs | S | S | S | S | S | S | NS | NS | ||||||||||
rhs | S | S | S | S | S | S | NS | NS | |||||||||||||||
result | S | S | S | S | S | S | NS | NS | |||||||||||||||
lambda | lhs | NS | NS | NS | NS | NS | NS | NS | NS | ||||||||||||||
rhs | NS | NS | NS | NS | NS | NS | NS | NS | |||||||||||||||
result | NS | NS | NS | NS | NS | NS | NS | NS | |||||||||||||||
Tan | `tan` | Tangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Tanh | `tanh` | Hyperbolic tangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
TimeAdd | Adds interval to timestamp | None | project | start | S* | ||||||||||||||||||
interval | PS (month intervals are not supported; Literal value only) | ||||||||||||||||||||||
result | S* | ||||||||||||||||||||||
lambda | start | NS | |||||||||||||||||||||
interval | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
TimeSub | Subtracts interval from timestamp | None | project | start | S* | ||||||||||||||||||
interval | PS (months not supported; Literal value only) | ||||||||||||||||||||||
result | S* | ||||||||||||||||||||||
lambda | start | NS | |||||||||||||||||||||
interval | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
ToDegrees | `degrees` | Converts radians to degrees | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
ToRadians | `radians` | Converts degrees to radians | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
ToUnixTimestamp | `to_unix_timestamp` | Returns the UNIX timestamp of the given time | None | project | timeExp | S | S* | S | |||||||||||||||
format | PS (A limited number of formats are supported; Literal value only) | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | timeExp | NS | NS | NS | |||||||||||||||||||
format | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
UnaryMinus | `negative` | Negate a numeric value | None | project | input | S | S | S | S | S | S | NS | NS | ||||||||||
result | S | S | S | S | S | S | NS | NS | |||||||||||||||
lambda | input | NS | NS | NS | NS | NS | NS | NS | NS | ||||||||||||||
result | NS | NS | NS | NS | NS | NS | NS | NS | |||||||||||||||
UnaryPositive | `positive` | A numeric value with a + in front of it | None | project | input | S | S | S | S | S | S | NS | NS | ||||||||||
result | S | S | S | S | S | S | NS | NS | |||||||||||||||
lambda | input | NS | NS | NS | NS | NS | NS | NS | NS | ||||||||||||||
result | NS | NS | NS | NS | NS | NS | NS | NS | |||||||||||||||
UnixTimestamp | `unix_timestamp` | Returns the UNIX timestamp of current or specified time | None | project | timeExp | S | S* | S | |||||||||||||||
format | PS (A limited number of formats are supported; Literal value only) | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | timeExp | NS | NS | NS | |||||||||||||||||||
format | NS | ||||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Upper | `upper`, `ucase` | String uppercase operator | This is not 100% compatible with the Spark version because in some cases unicode characters change byte width when changing the case. The GPU string conversion does not support these characters. For a full list of unsupported characters see https://github.com/rapidsai/cudf/issues/3132 | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
WeekDay | `weekday` | Returns the day of the week (0 = Monday...6=Sunday) | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Year | `year` | Returns the year from a date or timestamp | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
lambda | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Average | `avg`, `mean` | Average aggregate operator | None | aggregation | input | S | S | S | S | S | S | NS | |||||||||||
result | S | NS | |||||||||||||||||||||
reduction | input | S | S | S | S | S | S | NS | |||||||||||||||
result | S | NS | |||||||||||||||||||||
window | input | NS | NS | NS | NS | NS | NS | NS | |||||||||||||||
result | NS | NS | |||||||||||||||||||||
Count | `count` | Count aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS |
result | S | ||||||||||||||||||||||
reduction | input | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS | ||||
result | S | ||||||||||||||||||||||
window | input | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS | ||||
result | S | ||||||||||||||||||||||
First | `first_value`, `first` | first aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS |
ignoreNulls | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS | |||||
reduction | input | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS | ||||
ignoreNulls | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS | |||||
window | input | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
ignoreNulls | NS | ||||||||||||||||||||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
Last | `last`, `last_value` | last aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS |
ignoreNulls | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS | |||||
reduction | input | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS | ||||
ignoreNulls | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | NS | |||||
window | input | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
ignoreNulls | NS | ||||||||||||||||||||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
Max | `max` | Max aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | |
result | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | ||||||
reduction | input | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | ||||||
window | input | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | ||||||
Min | `min` | Min aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | |
result | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | ||||||
reduction | input | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | ||||||
window | input | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | S* | S | NS | S | NS | NS | NS | NS | NS | ||||||
Sum | `sum` | Sum aggregate operator | None | aggregation | input | S | S | S | S | S | S | NS | |||||||||||
result | S | S | NS | ||||||||||||||||||||
reduction | input | S | S | S | S | S | S | NS | |||||||||||||||
result | S | S | NS | ||||||||||||||||||||
window | input | S | S | S | S | S | S | NS | |||||||||||||||
result | S | S | NS | ||||||||||||||||||||
NormalizeNaNAndZero | Normalize NaN and zero | None | project | input | S | S | |||||||||||||||||
result | S | S | |||||||||||||||||||||
lambda | input | NS | NS | ||||||||||||||||||||
result | NS | NS |
- as was state previously Decimal is only supported up to a precision of 18 and Timestamp is only supported in the UTC time zone. Decimals are off by default due to performance impact in some cases.
Cast
The above table does not show what is and is not supported for cast very well. This table shows the matrix of supported casts. UDT types are not show as they are not supported by the accelerator, and have very limited support in Spark. Nested types like MAP, Struct, and Array can only be cast if the child types can be cast.
Some of the casts to/from string on the GPU are not 100% the same and are disabled by default. Please see the configs for more details on these specific cases.
TO | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | ||
FROM | BOOLEAN | S | S | S | S | S | S | S | S | S | NS | |||||||
BYTE | S | S | S | S | S | S | S | S | S | NS | S | |||||||
SHORT | S | S | S | S | S | S | S | S | S | NS | S | |||||||
INT | S | S | S | S | S | S | S | S | S | NS | S | |||||||
LONG | S | S | S | S | S | S | S | S | S | NS | S | |||||||
FLOAT | S | S | S | S | S | S | S | S | S | NS | ||||||||
DOUBLE | S | S | S | S | S | S | S | S | S | NS | ||||||||
DATE | S | S | S | S | S | S | S | S | S | S | NS | |||||||
TIMESTAMP | S | S | S | S | S | S | S | S | S | S | NS | |||||||
STRING | S | S | S | S | S | S | S | S | S | S | NS | S | NS | |||||
DECIMAL | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||||
NULL | S | S | S | S | S | S | S | S | S | S | NS | NS | NS | NS | NS | NS | NS | |
BINARY | NS | NS | CALENDAR | NS | NS | </tr> </tr> | ARRAY | NS | NS | </tr> </tr> | MAP | NS | NS | </tr> </tr> | STRUCT | NS | NS | </tr>
Input/Output
For Input and Output it is not cleanly exposed what types are supported and which are not. This table tries to clarify that. Be aware that some types may be disabled in some cases for either reads or writes because of processing limitations, like rebasing dates or timestamps, or for a lack of type coercion support.
Format | Direction | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Parquet | Input | S | S | S | S | S | S | S | S | S | S | NS | NS | PS (missing nested DECIMAL, BINARY) | PS (missing nested DECIMAL, BINARY) | PS (missing nested DECIMAL, BINARY) | ||
Output | S | S | S | S | S | S | S | S | S | S | NS | NS | NS | NS | NS | |||
ORC | Input | S | S | S | S | S | S | S | S | S | S | NS | NS | NS | NS | NS | NS | |
Output | S | S | S | S | S | S | S | S | S | S | NS | NS | NS | NS | NS | NS | ||
CSV | Input | S | S | S | S | S | S | S | S | S | S | NS | NS | NS | NS | NS | NS | NS |