View source on GitHub |
Combination of AnalyzeDataset and TransformDataset.
tft_beam.AnalyzeAndTransformDataset(
preprocessing_fn, output_record_batches=False
)
transformed, transform_fn = AnalyzeAndTransformDataset(
preprocessing_fn).expand(dataset)
should be equivalent to
transform_fn = AnalyzeDataset(preprocessing_fn).expand(dataset)
transformed = TransformDataset().expand((dataset, transform_fn))
but may be more efficient since it avoids multiple passes over the data.
Attributes | |
---|---|
label
|
Methods
annotations
annotations() -> Dict[str, Union[bytes, str, message.Message]]
default_label
default_label()
default_type_hints
default_type_hints()
display_data
display_data()
Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
Returns | |
---|---|
Dict[str, Any]: A dictionary containing key:value pairs.
The value might be an integer, float or string value; a
:class:DisplayDataItem for values that have more data
(e.g. short value, label, url); or a :class:HasDisplayData instance
that has more display data that should be picked up. For example::
{ 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent } |
expand
expand(
dataset
)
Transform the dataset by applying the preprocessing_fn.
Args | |
---|---|
dataset
|
A dataset. |
Returns | |
---|---|
A (Dataset, TransformFn) pair containing the preprocessed dataset and the graph that maps the input to the output data. |
from_runner_api
@classmethod
from_runner_api( proto, context )
get_resource_hints
get_resource_hints()
get_type_hints
get_type_hints()
Gets and/or initializes type hints for this object.
If type hints have not been set, attempts to initialize type hints in this order:
- Using self.default_type_hints().
- Using self.class type hints.
get_windowing
get_windowing(
inputs
)
Returns the window function to be associated with transform's output.
By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).
infer_output_type
infer_output_type(
unused_input_type
)
register_urn
@classmethod
register_urn( urn, parameter_type, constructor=None )
runner_api_requires_keyed_input
runner_api_requires_keyed_input()
to_runner_api
to_runner_api(
context, has_parts=False, **extra_kwargs
)
to_runner_api_parameter
to_runner_api_parameter(
unused_context
)
to_runner_api_pickled
to_runner_api_pickled(
unused_context
)
type_check_inputs
type_check_inputs(
pvalueish
)
type_check_inputs_or_outputs
type_check_inputs_or_outputs(
pvalueish, input_or_output
)
type_check_outputs
type_check_outputs(
pvalueish
)
with_input_types
with_input_types(
input_type_hint
)
Annotates the input type of a :class:PTransform
with a type-hint.
Args | |
---|---|
input_type_hint
|
type
An instance of an allowed built-in type, a custom
class, or an instance of a
:class: |
Raises | |
---|---|
TypeError
|
If input_type_hint is not a valid type-hint.
See
:obj:apache_beam.typehints.typehints.validate_composite_type_param()
for further details.
|
Returns | |
---|---|
PTransform
|
A reference to the instance of this particular
:class:PTransform object. This allows chaining type-hinting related
methods.
|
with_output_types
with_output_types(
type_hint
)
Annotates the output type of a :class:PTransform
with a type-hint.
Args | |
---|---|
type_hint
|
type
An instance of an allowed built-in type, a custom class,
or a :class: |
Raises | |
---|---|
TypeError
|
If type_hint is not a valid type-hint. See
:obj:~apache_beam.typehints.typehints.validate_composite_type_param()
for further details.
|
Returns | |
---|---|
PTransform
|
A reference to the instance of this particular
:class:PTransform object. This allows chaining type-hinting related
methods.
|
with_resource_hints
with_resource_hints(
**kwargs
)
Adds resource hints to the :class:PTransform
.
Resource hints allow users to express constraints on the environment where the transform should be executed. Interpretation of the resource hints is defined by Beam Runners. Runners may ignore the unsupported hints.
Args | |
---|---|
**kwargs
|
key-value pairs describing hints and their values. |
Raises | |
---|---|
ValueError
|
if provided hints are unknown to the SDK. See
:mod:apache_beam.transforms.resources for a list of known hints.
|
Returns | |
---|---|
PTransform
|
A reference to the instance of this particular
:class:PTransform object.
|
__or__
__or__(
right
)
Used to compose PTransforms, e.g., ptransform1 | ptransform2.
__ror__
__ror__(
left, label=None
)
Used to apply this PTransform to non-PValues, e.g., a tuple.
__rrshift__
__rrshift__(
label
)
Class Variables | |
---|---|
pipeline |
None
|
side_inputs |
()
|