mlm_insights.mlm_native.readers package

Submodules

mlm_insights.mlm_native.readers.csv_native_data_reader module

class mlm_insights.mlm_native.readers.csv_native_data_reader.CSVNativeDataReader(file_path: List[str] | str = '', data_source: DataSource | None = None, **kwargs: Any)

Bases: NativeDataReader

This Data Reader can read CSV using Native (pandas) execution engine.
This reader can handle reading both from Local file system and OCI Object storage as well.

Configuration

file_path: Union[List[str], str]
  • The path or list of paths to CSV files.

data_source Optional[DataSource]:
  • A DataSource object to read data from.

Sample code

For reading from file_path which is a string
    test_files = [
        'data/csv/2000-01-01.csv',
        'data/csv/2000-01-30.csv'
    ]
    csv_reader = CsvNativeDataReader(file_path=test_files)
    actual_df = csv_reader.read(None)

For reading using some datasource
    data_source_args = {
        'bucket_name': bucket_name,
        'namespace': namespace,
        'object_prefix': object_prefix,
        'file_type': 'csv',
        'storage_options' : {"config": "~/.oci/config"} to authenticate the file systems
    }
    file_location = 'oci://%s@%s/%s' % (bucket_name, namespace, object_prefix)
    ds = SomeDataSource(file_path=file_location, **data_source_args)
    csv_reader = CsvNativeDataReader(data_source=ds)
    actual_df = csv_reader.read(None)
classmethod create(config: Dict[str, Any]) CSVNativeDataReader

Factory method to create an instance of CSVNativeDataReader from a configuration dictionary.

Parameters

config (Dict[str, Any]):
  • A dictionary containing configuration information.

Configuration

file_path: Union[List[str], str]
  • The path or list of paths to CSV files.

Returns

CSVNativeDataReader: An instance of CSVNativeDataReader.

read(schema_provider: SchemaProvider, **kwargs: Any) DataFrame
Reads the data from local file system / OCI file system.

Parameters

schema_providerSchemaProvider
  • dtypes of the column present

Other parameters

storage_options :
  • {“config”: “~/.oci/config”} to authenticate the file systems

kwargs:
  • Extra keyword arguments to forward to pandas.read_csv().

Returns

pandas.DataFrame:
  • Result of reading the data from the local file system / OCI file system.

mlm_insights.mlm_native.readers.jsonl_native_data_reader module

class mlm_insights.mlm_native.readers.jsonl_native_data_reader.JsonlNativeDataReader(file_path: List[str] | str = '', data_source: DataSource | None = None, **kwargs: Any)

Bases: NativeDataReader

This Data Reader can read JSONL using Native (pandas) execution engine.
This reader can handle reading both from Local file system and OCI Object storage as well.

Configuration

file_path: Union[List[str], str]
  • The path or list of paths to JSONL files.

data_source Optional[DataSource]:
  • A DataSource object to read data from.

Sample code

For reading from file_path which is a string
    test_files = [
        'data/jsonl/2000-01-01.jsonl',
        'data/jsonl/2000-01-30.jsonl'
    ]
    jsonl_reader = JsonlNativeDataReader(file_path=test_files)
    actual_df = jsonl_reader.read(None)

For reading using some datasource
    data_source_args = {
        'bucket_name': bucket_name,
        'namespace': namespace,
        'object_prefix': object_prefix,
        'file_type': 'jsonl',
        'storage_options' : {"config": "~/.oci/config"} to authenticate the file systems
    }

    file_location = 'oci://%s@%s/%s' % (bucket_name, namespace, object_prefix)
    ds = SomeDataSource(file_path=file_location, **data_source_args)
    jsonl_reader = JsonlNativeDataReader(data_source=ds)
    actual_df = jsonl_reader.read(None)
classmethod create(config: Dict[str, Any]) JsonlNativeDataReader

Factory method to create an instance of JsonlNativeDataReader from a configuration dictionary.

Parameters

config (Dict[str, Any]):
  • A dictionary containing configuration information.

Configuration

file_path: Union[List[str], str]
  • The path or list of paths to JSONL files.

Returns

JsonlNativeDataReader: An instance of JsonlNativeDataReader.

read(schema_provider: SchemaProvider, **kwargs: Any) DataFrame
Read data from local file system / oci file system

Parameters

schema_providerSchemaProvider
  • dtypes of the column present

Other parameters

storage_options :
  • {“config”: “~/.oci/config”} to authenticate the file systems

kwargs :
  • Extra keyword arguments to forward to pandas.read_json().

Returns

pandas.DataFrame:
  • Result of reading the data from the local file system / OCI file system.

mlm_insights.mlm_native.readers.nested_json_native_data_reader module

class mlm_insights.mlm_native.readers.nested_json_native_data_reader.NestedJsonNativeDataReader(file_path: List[str] | str = '', query: str = '', query_engine_name: str = '', data_source: DataSource | None = None, **kwargs: Any)

Bases: NativeDataReader

This Data Reader can extract data from Nested JSON using Native (pandas) execution engine.
This reader can handle reading both from Local file system and OCI Object storage as well.

Configuration

file_path: Union[List[str], str]
  • The path or list of paths to JSON files.

data_source Optional[DataSource]:
  • A DataSource object to read data from.

query: str
  • A query string to extract data from the JSON files.

query_engine_namestr
  • Name of query engine to run the query. Currently, only JMESPATH is supported.

Sample code:

For reading from file_path which is a string
    test_files = [
        'data/json/2000-01-01.json',
        'data/json/2000-01-30.json'
    ]
    query = "user defined query"
    query_engine_name = "JMESPATH"

    nested_json_reader = NestedJsonNativeDataReader(file_path=test_files, query=query, query_engine_name=query_engine_name)
    actual_df = nested_json_reader.read(None)

For reading using some datasource
    data_source_args = {
        'bucket_name': bucket_name,
        'namespace': namespace,
        'object_prefix': object_prefix,
        'file_type': 'jsonl',
        'storage_options' : {"config": "~/.oci/config"} to authenticate the file systems
    }

    file_location = 'oci://%s@%s/%s' % (bucket_name, namespace, object_prefix)
    ds = SomeDataSource(file_path=file_location, **data_source_args
    query = "user defined query"
    query_engine_name = "JMESPATH"
    nested_json_reader = NestedJsonNativeDataReader(data_source=ds,query=query, query_engine_name=query_engine_name)
    actual_df = nested_json_reader.read(None)
classmethod create(config: Dict[str, Any]) NestedJsonNativeDataReader
Factory method to create an instance of NestedJsonNativeDataReader from a configuration dictionary.

Parameters

config (Dict[str, Any]):
  • A dictionary containing configuration information.

Configuration

file_path: Union[List[str], str]
  • The path or list of paths to JSON files.

data_source Optional[DataSource]:
  • A DataSource object to read data from.

query: str
  • A query string to extract data from the JSON files.

query_engine_namestr
  • Name of query engine to run the query. Currently, only JMESPATH is supported.

Parameters

configDict[str, Any]

dictionary for providing config inputs like FILE_PATH_KEY or DATA_SOURCE, QUERY and QUERY_ENGINE_NAME.

Returns:

NestedJsonNativeDataReader: An instance of NestedJsonNativeDataReader.

read(schema_provider: SchemaProvider, **kwargs: Any) DataFrame
Reads the data from the local file system / OCI file system.

Parameters

schema_providerSchemaProvider
  • dtypes of the column present

Other parameters

kwargs:
  • Extra keyword arguments to forward to pandas.DataFrame.

Returns

pandas.DataFrame
  • Result of reading the data from the local file system / OCI file system.

Module contents