API Reference¶
condition module¶
-
class
condition.
FieldList
(fields)[source]¶ Exposes each of the list as a field attribute which can then be used to construct field conditions.
- Parameters
fields (Collection[str]) –
-
class
condition.
Condition
[source]¶ Represents a condition object. It is immutable.
-
apply
(application, **kwargs)[source]¶ Applies the
ConditionApplication
to this condition. This is an extension mechanism allowing you to implement thecondition
for different usage contexts.- Parameters
application (condition._condition.ConditionApplication) –
-
static
register_application
(name, application)[source]¶ A syntax sugar to enable calling your
ConditionApplication
as if it were built in theCondition
class. Afterwards, you can callcond.<name>()
which is actuallycond.apply(application())
.- Parameters
name (str) – the method name. This must be unique
application (condition._condition.ConditionApplication) – your application class or object to be called by this method. If it is an object, your object must be able to handle concurrent calls. If it is a class, it must have a no-arg constructor, and a new object will be created for each call.
-
set_param
(name, val)[source]¶ Sets additional param/value to pass to the end consumer. For example, the params can be used in sql templates. Note that only the top condition’s params is used.
- Parameters
name (str) – the param name. It will be available in jinja2 SQL template.
val (Any) – the value
- Return type
None
-
to_sql_where_condition
(db_map=None, indent=1)[source]¶ Generates a string representing the condition for used in a sql where clause.
- Parameters
db_map (Optional[Dict[str, str]]) – map from a field name to a db field name. Note that you can also pass in alias in the db field name. By default, use field names directly.
indent (int) –
- Returns
condition string for sql where clause.
- Return type
str
-
get_all_field_conditions
()[source]¶ Returns all
FieldCondition
contained in this condition.- Returns
a dict: field name -> list of
FieldCondition
for this field.- Return type
collections.OrderedDict
-
to_sql_dict
(dbmap=None)[source]¶ Generates a dict to pass into a sql template.
Before you write your sql template, you can call this method and print out the dict (keys) to get an idea of what are available to use in your sql template.
See also usage examples.
- Parameters
dbmap (Optional[Dict[str, str]]) – to map to the actual db field name (optionally with alias) when generating “where_condition”
- Returns
the dict
- Return type
Dict[str, Any]
-
to_df_query
()[source]¶ - Returns
a string representing the condition to be used in df.query()
- Return type
str
-
query
(df)[source]¶ Queries the passed in dataframe with this condition.
- Parameters
df (pandas.core.frame.DataFrame) – the dataframe to perform query. Each field in the condition must match a columns or an index level in the data frame.
- Returns
a dataframe whose rows satisfy this condition.
- Return type
pandas.core.frame.DataFrame
-
static
from_pyarrow_filter
(filters=None)[source]¶ Constructs a condition from pyarrow style filters.
- Parameters
filters (Optional[Union[List[Tuple], List[List[Tuple]]]]) – pyarrow filters. See pyarrow_read_table .
- Return type
condition._condition.Condition
-
eval
(record_dict, type_conversion=False)[source]¶ Evaluates the condition against the record to a bool of True of False. Note that if you have a large number of records, the recommended way to evaluate all of them in batch mode is to create a pandas DataFrame from the records and then call
condition.query(df)
. You can installnumexpr
package for much faster performance.- Parameters
record_dict (Dict) – a dict from a field to a value. Used to test
FieldCondition
.type_conversion (bool) – if True, convert value in record_dict to the
FieldCondition
value type before comparision. Sometimes such conversion is needed, for example, in pyarrow partition filtering.
- Return type
bool
-
normalize
()[source]¶ Normalizes the condition to be one of the following:
a
FieldCondition
an
And
with a list of subFieldCondition
an
Or
with a list of sub conditions as defined above.
In some cases, e.g., pyarrow filtering, the above restrictions must be followed. Any condition can be normalized to the above form in an equalivent way.
For example,
(a | b) & (c | d) & e
will be normalized to(a & c & e) | (a & d & e) | (b & c & e) | (b & d& e)
.- Returns
an equivalent normalized condition.
- Return type
condition._condition.Condition
-
to_pyarrow_filter
()[source]¶ Generates filters that can be passed to
pyarrow.parquet.ParquetDataset
orpandas.read_parquet
in order to read only the selected partitions, thereby increase efficiency. Please note that the field conditions not matching a partition key will be ignored, so you should follow up withcondition.query(df)
to filter out unnecessary rows.See also usage examples.
- Return type
Union[List[Tuple], List[List[Tuple]]]
-
add_date_condition
(date_field, from_date=None, to_date=None, to_exclusive=False, date_format=None)[source]¶ Adds to this condition that the date field should be between the passed in date range. This is a convenient method for working with time series.
- Parameters
date_field (condition._condition.Field) – the date field
from_date (Optional[Union[str, datetime.datetime]]) – if not None, the date field must be greater than or equal to this datetime value
to_date (Optional[Union[str, datetime.datetime]]) – if not None, the date field must be less than this datetime value
to_exclusive (Optional[bool]) – if False, the date field can be equal to the
to_date
date_format (Optional[str]) – the date_format to convert the date to a str. The default is None so not to convert.
- Return type
condition._condition.Condition
-
add_daterange_overlap_condition
(from_date_field=None, to_date_field=None, from_date=None, to_date=None, to_exclusive=False, date_format=None)[source]¶ Adds to this condition that the two date fields must overlap with the passed in date range. This is a convenient method for working with time series.
- Parameters
from_date_field (Optional[condition._condition.Field]) – the from date field
to_date_field (Optional[condition._condition.Field]) – the to date field
from_date (Optional[Union[str, datetime.datetime]]) – if not None, the
to_date_field
must be greater than or equal to this datetime valueto_date (Optional[Union[str, datetime.datetime]]) – if not None, the
from_date_field
must be less than this datetime valueto_exclusive (Optional[bool]) – if False, the
from_date_field
can be equal to theto_date
date_format (Optional[str]) – the date_format to convert the date to a str. The default is None so not to convert.
- Return type
condition._condition.Condition
-
visualize
(filename=None, view=False)[source]¶ Visualizes this condition structure with a ‘png’ image. This method requires
graphviz
package available.- Parameters
filename – the path to output the ‘png’ file.
view (bool) – if True, show the picture
- Return type
Any
-
split
(fields, field_map=None)[source]¶ Splits the condition to a new condition which only contains the passed in fields. This method is used in the following scenario:
A combined data item is joined from two or more sub data sources.
The condition is defined on the combined data.
Use this method to get a split condition to be applied to the sub data sources with the fields list in the sub data sources.
There may be a field mapping from this condition to the target sub data sources. If so, the split will be mapped to the target fields.
After the data is joined, apply the original condition on the combined data.
- Parameters
fields (Union[str, condition._condition.Field, condition._condition.FieldList, Collection[Union[str, condition._condition.Field]]]) – a
FieldList
or a collection of target fields (str orField
) to retain.field_map (Optional[Union[Dict[str, str], Dict[condition._condition.Field, condition._condition.Field]]]) – map from a field in this condition to the target field. If None, keep the field name.
- Returns
the condition to be applied for a data source with only the passed in fields. Returns
None
if no condition should be applied, namely, assuming True for each row.- Return type
condition._condition.Condition
-
static
parse
(condition_str, field_list=None, field_list_name='fl')[source]¶ Parses a str to be a condition object. The parse method is safe in that no irrelvant function/class can be called in the string. The
T()
is a shortcut ofpd.to_datetime()
to convert a string to a datetime.Examples: Below, cond1, cond2 and cond3 are equivalent.
>>> fl = FieldList(['A', 'B', 'C']) >>> cond1 = Condition.parse("(fl.A>T('20000101')) & (fl.B==['b1', 'b2']) & (fl.C>=100)") >>> cond2 = Condition.parse("And([fl.A>T('20000101'), fl.B==['b1', 'b2'], fl.C>=100])") >>> cond3 = Condition.parse(repr(cond1))
- Parameters
condition_str (str) – the string contains condition expression.
field_list (Optional[condition._condition.FieldList]) – the
FieldList
object. If None, look up from the caller’s context.field_list_name (str) – the field list name used in
condition_str
parameter. Default to ‘fl’.
- Return type
condition._condition.Condition
-
-
class
condition.
FieldCondition
(field, op, val)[source]¶ A condition which compares a field with a value or tests if a field in/not in a set of values.
- Parameters
field (condition._condition.Field) –
op (condition._condition.Operator) –
val (Any) –
-
class
condition.
CompositeCondition
(conditions=None)[source]¶ - Parameters
conditions (Optional[List[condition._condition.Condition]]) –
-
class
condition.
And
(conditions=None)[source]¶ An ‘and’ condition composed of a list of sub conditions. Usage examples:
>>> fl = FieldList(['f1', 'f2', 'f3']) >>> condition = And ([ ... fl.f1 <= 300, ... fl.f2 > pd.to_datetime('20000101'), ... fl.f3 == (['val1', 'val2']) ... ])
Alternatively, it can be created as follows:
>>> condition2 = (fl.f1 <= 300) & (fl.f2 > pd.to_datetime('20000101')) & (fl.f3 == (['val1', 'val2']))
- Parameters
conditions (Optional[List[condition._condition.Condition]]) –
-
class
condition.
Or
(conditions=None)[source]¶ An ‘or’ condition composed of a list of sub conditions. Usage examples:
>>> fl = FieldList(['f1', 'f2', 'f3']) >>> condition = Or([fl.f1 <= 300, ... fl.f2 > pd.to_datetime('20000101'), ... fl.f3 == (['val1', 'val2'])]) >>> condition2 = ((fl.f1 <= 300) ... | (fl.f2 > pd.to_datetime('20000101')) ... | (fl.f3 == (['val1', 'val2'])))
- Parameters
conditions (Optional[List[condition._condition.Condition]]) –
condition.sql module¶
-
condition.sql.
render_sql
(sql_template, condition, dbmap=None)[source]¶ Renders a jinja2 sql template with
dict
fromcondition.to_sql_dict()
. Optionally overwrite field names withdbmap
. Please see also usage examples.- Parameters
sql_template (str) – a jinja2 sql template.
condition (condition._condition.Condition) – for generating the
dict
of conditions to be used in sqldbmap (Optional[dict]) – optionally overwrite field names.
- Raises
UndefinedError – if a variable in sql template is undefined
- Return type
str