API Reference

condition module

class condition.FieldList(fields)[source]

Exposes each of the list as a field attribute which can then be used to construct field conditions.

Parameters

fields (Collection[str]) –

classmethod from_df(df)[source]

A shortcut to construct a field list from the index names and columns of the dataframe

Parameters

df (pandas.core.frame.DataFrame) –

Return type

condition._condition.FieldList

class condition.Condition[source]

Represents a condition object. It is immutable.

apply(application, **kwargs)[source]

Applies the ConditionApplication to this condition. This is an extension mechanism allowing you to implement the condition for different usage contexts.

Parameters

application (condition._condition.ConditionApplication) –

static register_application(name, application)[source]

A syntax sugar to enable calling your ConditionApplication as if it were built in the Condition class. Afterwards, you can call cond.<name>() which is actually cond.apply(application()).

Parameters
  • name (str) – the method name. This must be unique

  • application (condition._condition.ConditionApplication) – your application class or object to be called by this method. If it is an object, your object must be able to handle concurrent calls. If it is a class, it must have a no-arg constructor, and a new object will be created for each call.

set_param(name, val)[source]

Sets additional param/value to pass to the end consumer. For example, the params can be used in sql templates. Note that only the top condition’s params is used.

Parameters
  • name (str) – the param name. It will be available in jinja2 SQL template.

  • val (Any) – the value

Return type

None

to_sql_where_condition(db_map=None, indent=1)[source]

Generates a string representing the condition for used in a sql where clause.

Parameters
  • db_map (Optional[Dict[str, str]]) – map from a field name to a db field name. Note that you can also pass in alias in the db field name. By default, use field names directly.

  • indent (int) –

Returns

condition string for sql where clause.

Return type

str

get_all_field_conditions()[source]

Returns all FieldCondition contained in this condition.

Returns

a dict: field name -> list of FieldCondition for this field.

Return type

collections.OrderedDict

to_sql_dict(dbmap=None)[source]

Generates a dict to pass into a sql template.

Before you write your sql template, you can call this method and print out the dict (keys) to get an idea of what are available to use in your sql template.

See also usage examples.

Parameters

dbmap (Optional[Dict[str, str]]) – to map to the actual db field name (optionally with alias) when generating “where_condition”

Returns

the dict

Return type

Dict[str, Any]

to_df_query()[source]
Returns

a string representing the condition to be used in df.query()

Return type

str

query(df)[source]

Queries the passed in dataframe with this condition.

Parameters

df (pandas.core.frame.DataFrame) – the dataframe to perform query. Each field in the condition must match a columns or an index level in the data frame.

Returns

a dataframe whose rows satisfy this condition.

Return type

pandas.core.frame.DataFrame

static from_pyarrow_filter(filters=None)[source]

Constructs a condition from pyarrow style filters.

Parameters

filters (Optional[Union[List[Tuple], List[List[Tuple]]]]) – pyarrow filters. See pyarrow_read_table .

Return type

condition._condition.Condition

eval(record_dict, type_conversion=False)[source]

Evaluates the condition against the record to a bool of True of False. Note that if you have a large number of records, the recommended way to evaluate all of them in batch mode is to create a pandas DataFrame from the records and then call condition.query(df). You can install numexpr package for much faster performance.

Parameters
  • record_dict (Dict) – a dict from a field to a value. Used to test FieldCondition.

  • type_conversion (bool) – if True, convert value in record_dict to the FieldCondition value type before comparision. Sometimes such conversion is needed, for example, in pyarrow partition filtering.

Return type

bool

normalize()[source]

Normalizes the condition to be one of the following:

  • a FieldCondition

  • an And with a list of sub FieldCondition

  • an Or with a list of sub conditions as defined above.

In some cases, e.g., pyarrow filtering, the above restrictions must be followed. Any condition can be normalized to the above form in an equalivent way.

For example, (a | b) & (c | d) & e will be normalized to (a & c & e) | (a & d & e) | (b & c & e) | (b & d& e).

Returns

an equivalent normalized condition.

Return type

condition._condition.Condition

to_pyarrow_filter()[source]

Generates filters that can be passed to pyarrow.parquet.ParquetDataset or pandas.read_parquet in order to read only the selected partitions, thereby increase efficiency. Please note that the field conditions not matching a partition key will be ignored, so you should follow up with condition.query(df) to filter out unnecessary rows.

See also usage examples.

Return type

Union[List[Tuple], List[List[Tuple]]]

add_date_condition(date_field, from_date=None, to_date=None, to_exclusive=False, date_format=None)[source]

Adds to this condition that the date field should be between the passed in date range. This is a convenient method for working with time series.

Parameters
  • date_field (condition._condition.Field) – the date field

  • from_date (Optional[Union[str, datetime.datetime]]) – if not None, the date field must be greater than or equal to this datetime value

  • to_date (Optional[Union[str, datetime.datetime]]) – if not None, the date field must be less than this datetime value

  • to_exclusive (Optional[bool]) – if False, the date field can be equal to the to_date

  • date_format (Optional[str]) – the date_format to convert the date to a str. The default is None so not to convert.

Return type

condition._condition.Condition

add_daterange_overlap_condition(from_date_field=None, to_date_field=None, from_date=None, to_date=None, to_exclusive=False, date_format=None)[source]

Adds to this condition that the two date fields must overlap with the passed in date range. This is a convenient method for working with time series.

Parameters
  • from_date_field (Optional[condition._condition.Field]) – the from date field

  • to_date_field (Optional[condition._condition.Field]) – the to date field

  • from_date (Optional[Union[str, datetime.datetime]]) – if not None, the to_date_field must be greater than or equal to this datetime value

  • to_date (Optional[Union[str, datetime.datetime]]) – if not None, the from_date_field must be less than this datetime value

  • to_exclusive (Optional[bool]) – if False, the from_date_field can be equal to the to_date

  • date_format (Optional[str]) – the date_format to convert the date to a str. The default is None so not to convert.

Return type

condition._condition.Condition

visualize(filename=None, view=False)[source]

Visualizes this condition structure with a ‘png’ image. This method requires graphviz package available.

Parameters
  • filename – the path to output the ‘png’ file.

  • view (bool) – if True, show the picture

Return type

Any

split(fields, field_map=None)[source]

Splits the condition to a new condition which only contains the passed in fields. This method is used in the following scenario:

  1. A combined data item is joined from two or more sub data sources.

  2. The condition is defined on the combined data.

  3. Use this method to get a split condition to be applied to the sub data sources with the fields list in the sub data sources.

  4. There may be a field mapping from this condition to the target sub data sources. If so, the split will be mapped to the target fields.

  5. After the data is joined, apply the original condition on the combined data.

Parameters
  • fields (Union[str, condition._condition.Field, condition._condition.FieldList, Collection[Union[str, condition._condition.Field]]]) – a FieldList or a collection of target fields (str or Field) to retain.

  • field_map (Optional[Union[Dict[str, str], Dict[condition._condition.Field, condition._condition.Field]]]) – map from a field in this condition to the target field. If None, keep the field name.

Returns

the condition to be applied for a data source with only the passed in fields. Returns None if no condition should be applied, namely, assuming True for each row.

Return type

condition._condition.Condition

static parse(condition_str, field_list=None, field_list_name='fl')[source]

Parses a str to be a condition object. The parse method is safe in that no irrelvant function/class can be called in the string. The T() is a shortcut of pd.to_datetime() to convert a string to a datetime.

Examples: Below, cond1, cond2 and cond3 are equivalent.

>>> fl = FieldList(['A', 'B', 'C'])
>>> cond1 = Condition.parse("(fl.A>T('20000101')) & (fl.B==['b1', 'b2'])  & (fl.C>=100)")
>>> cond2 = Condition.parse("And([fl.A>T('20000101'), fl.B==['b1', 'b2'], fl.C>=100])")
>>> cond3 = Condition.parse(repr(cond1))
Parameters
  • condition_str (str) – the string contains condition expression.

  • field_list (Optional[condition._condition.FieldList]) – the FieldList object. If None, look up from the caller’s context.

  • field_list_name (str) – the field list name used in condition_str parameter. Default to ‘fl’.

Return type

condition._condition.Condition

class condition.FieldCondition(field, op, val)[source]

A condition which compares a field with a value or tests if a field in/not in a set of values.

Parameters
  • field (condition._condition.Field) –

  • op (condition._condition.Operator) –

  • val (Any) –

class condition.CompositeCondition(conditions=None)[source]
Parameters

conditions (Optional[List[condition._condition.Condition]]) –

apply_to_subs(application, **kwargs)[source]

Recursively apply the application to the sub conditions.

Parameters

application (condition._condition.ConditionApplication) –

class condition.And(conditions=None)[source]

An ‘and’ condition composed of a list of sub conditions. Usage examples:

>>> fl = FieldList(['f1', 'f2', 'f3'])
>>> condition = And ([
...            fl.f1 <= 300,
...            fl.f2 > pd.to_datetime('20000101'),
...            fl.f3 == (['val1', 'val2'])
...         ])

Alternatively, it can be created as follows:

>>> condition2 = (fl.f1 <= 300) & (fl.f2 > pd.to_datetime('20000101')) & (fl.f3 == (['val1', 'val2']))
Parameters

conditions (Optional[List[condition._condition.Condition]]) –

class condition.Or(conditions=None)[source]

An ‘or’ condition composed of a list of sub conditions. Usage examples:

>>> fl = FieldList(['f1', 'f2', 'f3'])
>>> condition = Or([fl.f1 <= 300,
...     fl.f2 > pd.to_datetime('20000101'),
...     fl.f3 == (['val1', 'val2'])])
>>> condition2 = ((fl.f1 <= 300)
...     | (fl.f2 > pd.to_datetime('20000101'))
...     | (fl.f3 == (['val1', 'val2'])))
Parameters

conditions (Optional[List[condition._condition.Condition]]) –

condition.sql module

condition.sql.render_sql(sql_template, condition, dbmap=None)[source]

Renders a jinja2 sql template with dict from condition.to_sql_dict(). Optionally overwrite field names with dbmap. Please see also usage examples.

Parameters
  • sql_template (str) – a jinja2 sql template.

  • condition (condition._condition.Condition) – for generating the dict of conditions to be used in sql

  • dbmap (Optional[dict]) – optionally overwrite field names.

Raises

UndefinedError – if a variable in sql template is undefined

Return type

str