moDiNA

Submodules

modina.context_net_inference module

modina.context_net_inference.calculate_association_scores(ord_data, nom_data, cont_data, bi_data, test_type='nonparametric', num_workers=1, nan_value=-89)[source]
Return type:

DataFrame

modina.context_net_inference.compute_context_scores(context_data, meta_file, test_type='nonparametric', correction='bh', num_workers=1, path=None, nan_value=None)[source]

Compute association scores for a given context.

Parameters:
  • context_data (DataFrame) – The raw context data (rows: samples, columns: variables).

  • meta_file (DataFrame) – Metadata file containing a ‘label’ and ‘type’ column to specify the data type of each variable.

  • test_type (str) – Type of tests to use for network inference. Defaults to ‘nonparametric’.

  • correction (str) – Correction method for multiple testing. Defaults to ‘bh’.

  • num_workers (int) – Number of workers for parallel processing. Defaults to 1.

  • path (Optional[str]) – Optional path to save the computed scores as a CSV file. Defaults to None.

  • nan_value (Optional[int]) – Numerical value used for NaN values in the context data. If None, an error will be raised if such values are present. Defaults to None.

Return type:

DataFrame

Returns:

A pd.DataFrame containing the computed association scores.

modina.context_net_inference.napy_bi_cont(cont_phenotypes, bi_phenotypes, test='nonparametric', num_workers=8, nan_value=-89)[source]
modina.context_net_inference.napy_bi_nom(nom_phenotypes, bi_phenotypes, num_workers=8, nan_value=-89)[source]
modina.context_net_inference.napy_bi_ord(ord_phenotypes, bi_phenotypes, num_workers=8, nan_value=-89)[source]
modina.context_net_inference.napy_cont_cont(cont_phenotypes, test='nonparametric', num_workers=8, nan_value=-89)[source]
modina.context_net_inference.napy_nom_cont(cont_phenotypes, nom_phenotypes, test='nonparametric', num_workers=8, nan_value=-89)[source]
modina.context_net_inference.napy_ord_cont(cont_phenotypes, ord_phenotypes, num_workers=8, nan_value=-89)[source]
modina.context_net_inference.napy_ord_nom(ord_phenotypes, nom_phenotypes, num_workers=8, nan_value=-89)[source]

modina.context_simulation module

modina.context_simulation.save_gt(groundtruths, path, mode='node')[source]
modina.context_simulation.simulate_copula(path=None, name1='context1', name2='context2', n_bi=50, n_cont=50, n_cat=50, n_samples=500, n_shift_cont=0, n_shift_bi=0, n_shift_cat=0, n_corr_cont_cont=0, n_corr_bi_bi=0, n_corr_cat_cat=0, n_corr_bi_cont=0, n_corr_bi_cat=0, n_corr_cont_cat=0, n_both_cont_cont=0, n_both_bi_bi=0, n_both_cat_cat=0, n_both_bi_cont=0, n_both_bi_cat=0, n_both_cont_cat=0, shift=0.5, corr=0.7)[source]

Simulate two contexts with binary and continuous nodes using a Gaussian copula.

Parameters:
  • path – Path to save the simulated contexts, the meta file and the ground truth information. If None, files are not saved.

  • name1 – Name of the first context.

  • name2 – Name of the second context.

  • n_bi – Number of binary nodes to simulate.

  • n_cont – Number of continuous nodes to simulate.

  • n_cat – Number of categorical nodes to simulate.

  • n_samples – Number of samples per context.

  • n_shift_cont – Number of continuous nodes with an artificially introduced mean shift.

  • n_shift_bi – Number of binary nodes with an artificially introduced mean shift.

  • n_shift_cat – Number of categorical nodes with an artificially introduced mean shift.

  • n_corr_cont_cont – Number of continuous node pairs with an artifically introduced correlation difference.

  • n_corr_bi_bi – Number of binary node pairs with an artificially introduced correlation difference.

  • n_corr_cat_cat – Number of categorical node pairs with an artificially introduced correlation difference.

  • n_corr_bi_cat – Number of binary-categorical node pairs with an artificially introduced correlation difference.

  • n_corr_cont_cat – Number of continuous-categorical node pairs with an artificially introduced correlation difference.

  • n_corr_bi_cont – Number of mixed node pairs with an artificially introduced correlation difference.

  • n_both_cont_cont – Number of continuous node pairs with both an aritificially introduced mean shift and correlation difference.

  • n_both_bi_bi – Number of binary node pairs with both an artificially introduced mean shift and correlation difference.

  • n_both_cat_cat – Number of categorical node pairs with both an artificially introduced mean shift and correlation difference.

  • n_both_bi_cat – Number of binary-categorical node pairs with both an artificially introduced mean shift and correlation difference.

  • n_both_cont_cat – Number of continuous-categorical node pairs with both an artificially introduced mean shift and correlation difference.

  • n_both_bi_cont – Number of mixed node pairs with both an artificially introduced mean shift and correlation difference.

  • shift – Magnitude of the mean shift.

  • corr – Magnitude of the correlation difference (measured as correlation coefficient between 0 and 1).

Returns:

A tuple containing the two simulated contexts, a meta file and a list of ground truth nodes. - context1: pd.DataFrame of the first simulated context. - context2: pd.DataFrame of the second simulated context. - meta: pd.DataFrame containing the data type for each simulated variable. - ground_truth: A tuple containing three lists of ground truth nodes: (shift_nodes, corr_nodes, shift_corr_nodes).

modina.diff_net_construction module

modina.diff_net_construction.compute_diff_edges(scores1, scores2, edge_metric, max_path_length=2, path=None)[source]

Compute differential edge scores based on the specified edge metric.

Parameters:
  • scores1 (DataFrame) – Statistical association scores of Context 1, rescaled and potentially filtered.

  • scores2 (DataFrame) – Statistical association scores of Context 2, rescaled and potentially filtered.

  • edge_metric (str) – Edge metric to compute the differential edge scores.

  • max_path_length (int) – Maximum length of paths to consider in the computation of integrated interaction scores. Defaults to 2.

  • path (Optional[str]) – Optional path to save the differential edge scores as a CSV file. Defaults to None.

Return type:

DataFrame

Returns:

A DataFrame containing the computed differential edge scores.

modina.diff_net_construction.compute_diff_network(scores1, scores2, context1, context2, edge_metric=None, node_metric=None, max_path_length=2, correction='bh', path=None, format='csv', meta_file=None, test_type='nonparametric', nan_value=None)[source]

Computation of a differential network defined by a node metric and an edge metric.

Parameters:
  • scores1 (DataFrame) – Statistical association scores of Context 1, rescaled and potentially filtered.

  • scores2 (DataFrame) – Statistical association scores of Context 2, rescaled and potentially filtered.

  • context1 (DataFrame) – Observed data of Context 1, potentially filtered.

  • context2 (DataFrame) – Observed data of Context 2, potentially filtered.

  • edge_metric (Optional[str]) – Edge metric used to construct the differential network.

  • node_metric (Optional[str]) – Node metric used to construct the differential network.

  • max_path_length (int) – Maximum length of paths to consider in the computation of integrated interaction scores. Defaults to 2.

  • correction (str) – Correction method for multiple testing. Defaults to ‘bh’.

  • path (Optional[str]) – Optional path to save the differential scores as CSV files. Defaults to None.

  • format (str) – File format to save the differential network. Options are ‘csv’ and ‘graphml’. Defaults to ‘csv’.

  • meta_file (Optional[DataFrame]) – Meta file containing the node types. Only needed if node_metric is ‘STC’. Defaults to None.

  • test_type (str) – Test type to use for continuous nodes in STC metric. Defaults to ‘nonparametric’.

  • nan_value (Optional[int]) – Numerical value used for NaN values in the context data. If None, an error will be raised if such values are present. Defaults to None.

Return type:

Tuple[Optional[DataFrame], Optional[DataFrame]]

Returns:

A tuple (edges_diff, nodes_diff) containing the computed differential edges and nodes.

modina.diff_net_construction.compute_diff_nodes(scores1, scores2, context1, context2, node_metric, correction='bh', meta_file=None, test_type='nonparametric', nan_value=None, path=None)[source]

Compute differential node scores based on the specified node metric.

Parameters:
  • scores1 (DataFrame) – Statistical association scores of Context 1, rescaled and potentially filtered.

  • scores2 (DataFrame) – Statistical association scores of Context 2, rescaled and potentially filtered.

  • context1 (DataFrame) – Observed data of Context 1, potentially filtered.

  • context2 (DataFrame) – Observed data of Context 2, potentially filtered.

  • node_metric (str) – Node metric to compute the differential node scores.

  • correction (str) – Correction method for multiple testing. Only needed if node_metric is ‘STC’. Defaults to ‘bh’.

  • meta_file (Optional[DataFrame]) – Meta file containing the node types. Only needed if node_metric is ‘STC’. Defaults to None.

  • test_type (str) – Test type to compare continuous variables across contexts for the ‘STC’ node metric. Defaults to ‘nonparametric’.

  • nan_value (Optional[int]) – Numerical value used for NaN values in the context data. If None, an error will be raised if such values are present. Defaults to None.

  • path (Optional[str]) – Optional path to save the differential node scores as a CSV file. Defaults to None.

Return type:

DataFrame

Returns:

A DataFrame containing the computed differential node scores.

modina.diff_net_construction.degree_centrality(nodes_diff, scores1, scores2, metric='pre-P', weighted=False)[source]
modina.diff_net_construction.interaction_score(data, max_path_length=3, metric='pre-E')[source]
modina.diff_net_construction.pagerank_centrality(nodes_diff, scores1, scores2, metric='pre-E')[source]
modina.diff_net_construction.stat_test_centrality(context1, context2, meta_file, test_type='nonparametric', correction='bh', nan_value=None)[source]

modina.edge_filtering module

modina.edge_filtering.filter(scores1, scores2, context1, context2, filter_method=None, filter_param=0.0, filter_metric=None, filter_rule=None, path=None)[source]

Filter association scores and context data based on the specified filtering configurations.

Parameters:
  • scores1 (DataFrame) – Statistical association scores of Context 1.

  • scores2 (DataFrame) – Statistical association scores of Context 2.

  • context1 (DataFrame) – The first context for the differential network analysis.

  • context2 (DataFrame) – The second context for the differential network analysis.

  • filter_method (Optional[str]) – Method used for filtering. Defaults to None.

  • filter_param (float) – Parameter for the specified filtering method. Defaults to 0.0.

  • filter_metric (Optional[str]) – Edge metric used for filtering. Defaults to None.

  • filter_rule (Optional[str]) – Rule to integrate the networks during filtering. Defaults to None.

  • path (Optional[str]) – Optional path to save the filtered scores and context data as CSV files. Defaults to None.

Return type:

Tuple[DataFrame, DataFrame, DataFrame, DataFrame]

Returns:

A tuple containing the filtered scores and context data.

modina.pipeline module

modina.pipeline.diffnet_analysis(context1, context2, meta_file, edge_metric=None, node_metric=None, ranking_alg='PageRank+', filter_method=None, filter_param=0.0, filter_metric=None, filter_rule=None, max_path_length=2, test_type='nonparametric', nan_value=None, correction='bh', num_workers=1, project_path=None, name1='context1', name2='context2')[source]

Wrapper function to perform an end-to-end differential network analysis following the moDiNA pipeline.

Parameters:
  • context1 (DataFrame) – Observed data of Context 1 (rows: samples, columns: variables).

  • context2 (DataFrame) – Observed data of Context 2 (rows: samples, columns: variables).

  • meta_file (DataFrame) – Metadata file containing a ‘label’ and ‘type’ column to specify the data type of each variable.

  • test_type (str) – Type of statistical tests to use for association score calculation. Defaults to ‘nonparametric’.

  • nan_value (Optional[int]) – Numerical value used for NaN values in the context data. If None, an error will be raised if such values are present. Defaults to None.

  • correction (str) – Correction method for multiple testing. Defaults to ‘bh’.

  • num_workers (int) – Number of workers for parallel processing. Defaults to 1.

  • filter_method (Optional[str]) – Method used for filtering. Defaults to None.

  • filter_param (float) – Parameter for the specified filtering method. Defaults to 0.0.

  • filter_metric (Optional[str]) – Edge metric used for filtering. Defaults to None.

  • filter_rule (Optional[str]) – Rule to integrate the networks during filtering. Defaults to None.

  • edge_metric (Optional[str]) – Edge metric used to construct the differential network.

  • node_metric (Optional[str]) – Node metric used to construct the differential network.

  • max_path_length (int) – Maximum length of paths to consider in the computation of integrated interaction scores. Defaults to 2.

  • ranking_alg (str) – Ranking algorithm to compute. Options are ‘PageRank+’, ‘PageRank’, ‘absDimontRank’, ‘DimontRank’, ‘direct_node’ and ‘direct_edge’. Defaults to ‘PageRank+’.

  • name1 (str) – Name of Context 1. Used for saving files. Defaults to ‘context1’.

  • name2 (str) – Name of Context 2. Used for saving files. Defaults to ‘context2’.

  • project_path (Optional[str]) – Optional path to save results. Defaults to None.

Return type:

Tuple[list, dict, Optional[DataFrame], Optional[DataFrame], dict]

Returns:

A tuple (ranking, edges_diff, nodes_diff, config) containing the computed ranking, differential edges, differential nodes, and configuration parameters.

modina.ranking module

modina.ranking.compute_ranking(nodes_diff, edges_diff, ranking_alg, path=None, meta_file=None)[source]

Compute a ranking based on the specified ranking algorithm.

Parameters:
  • nodes_diff (Optional[DataFrame]) – Differential node scores.

  • edges_diff (Optional[DataFrame]) – Differential edge scores.

  • ranking_alg (str) – Ranking algorithm to compute. Options are ‘PageRank+’, ‘PageRank’, ‘absDimontRank’, ‘DimontRank’, ‘direct_node’ and ‘direct_edge’.

  • meta_file (Optional[DataFrame]) – Metadata file containing a ‘label’ and ‘type’ column to specify the data type of each variable.

  • path (Optional[str]) – Optional path to save the ranking as a CSV file.

Return type:

Tuple[list, dict]

Returns:

A tuple containing the list of ranked nodes and a dictionary with ranked nodes per data type.

modina.ranking.dimontrank(edges_diff, edge_metric, mode='abs')[source]
modina.ranking.pagerank(edges_diff, edge_metric, nodes_diff=None, node_metric=None, personalization=True)[source]

modina.statistics_utils module

modina.statistics_utils.post_rescaling(diff_scores, metric)[source]
modina.statistics_utils.pre_rescaling(scores1, scores2, metric)[source]

Module contents

modina.compute_context_scores(context_data, meta_file, test_type='nonparametric', correction='bh', num_workers=1, path=None, nan_value=None)[source]

Compute association scores for a given context.

Parameters:
  • context_data (DataFrame) – The raw context data (rows: samples, columns: variables).

  • meta_file (DataFrame) – Metadata file containing a ‘label’ and ‘type’ column to specify the data type of each variable.

  • test_type (str) – Type of tests to use for network inference. Defaults to ‘nonparametric’.

  • correction (str) – Correction method for multiple testing. Defaults to ‘bh’.

  • num_workers (int) – Number of workers for parallel processing. Defaults to 1.

  • path (Optional[str]) – Optional path to save the computed scores as a CSV file. Defaults to None.

  • nan_value (Optional[int]) – Numerical value used for NaN values in the context data. If None, an error will be raised if such values are present. Defaults to None.

Return type:

DataFrame

Returns:

A pd.DataFrame containing the computed association scores.

modina.compute_diff_edges(scores1, scores2, edge_metric, max_path_length=2, path=None)[source]

Compute differential edge scores based on the specified edge metric.

Parameters:
  • scores1 (DataFrame) – Statistical association scores of Context 1, rescaled and potentially filtered.

  • scores2 (DataFrame) – Statistical association scores of Context 2, rescaled and potentially filtered.

  • edge_metric (str) – Edge metric to compute the differential edge scores.

  • max_path_length (int) – Maximum length of paths to consider in the computation of integrated interaction scores. Defaults to 2.

  • path (Optional[str]) – Optional path to save the differential edge scores as a CSV file. Defaults to None.

Return type:

DataFrame

Returns:

A DataFrame containing the computed differential edge scores.

modina.compute_diff_network(scores1, scores2, context1, context2, edge_metric=None, node_metric=None, max_path_length=2, correction='bh', path=None, format='csv', meta_file=None, test_type='nonparametric', nan_value=None)[source]

Computation of a differential network defined by a node metric and an edge metric.

Parameters:
  • scores1 (DataFrame) – Statistical association scores of Context 1, rescaled and potentially filtered.

  • scores2 (DataFrame) – Statistical association scores of Context 2, rescaled and potentially filtered.

  • context1 (DataFrame) – Observed data of Context 1, potentially filtered.

  • context2 (DataFrame) – Observed data of Context 2, potentially filtered.

  • edge_metric (Optional[str]) – Edge metric used to construct the differential network.

  • node_metric (Optional[str]) – Node metric used to construct the differential network.

  • max_path_length (int) – Maximum length of paths to consider in the computation of integrated interaction scores. Defaults to 2.

  • correction (str) – Correction method for multiple testing. Defaults to ‘bh’.

  • path (Optional[str]) – Optional path to save the differential scores as CSV files. Defaults to None.

  • format (str) – File format to save the differential network. Options are ‘csv’ and ‘graphml’. Defaults to ‘csv’.

  • meta_file (Optional[DataFrame]) – Meta file containing the node types. Only needed if node_metric is ‘STC’. Defaults to None.

  • test_type (str) – Test type to use for continuous nodes in STC metric. Defaults to ‘nonparametric’.

  • nan_value (Optional[int]) – Numerical value used for NaN values in the context data. If None, an error will be raised if such values are present. Defaults to None.

Return type:

Tuple[Optional[DataFrame], Optional[DataFrame]]

Returns:

A tuple (edges_diff, nodes_diff) containing the computed differential edges and nodes.

modina.compute_diff_nodes(scores1, scores2, context1, context2, node_metric, correction='bh', meta_file=None, test_type='nonparametric', nan_value=None, path=None)[source]

Compute differential node scores based on the specified node metric.

Parameters:
  • scores1 (DataFrame) – Statistical association scores of Context 1, rescaled and potentially filtered.

  • scores2 (DataFrame) – Statistical association scores of Context 2, rescaled and potentially filtered.

  • context1 (DataFrame) – Observed data of Context 1, potentially filtered.

  • context2 (DataFrame) – Observed data of Context 2, potentially filtered.

  • node_metric (str) – Node metric to compute the differential node scores.

  • correction (str) – Correction method for multiple testing. Only needed if node_metric is ‘STC’. Defaults to ‘bh’.

  • meta_file (Optional[DataFrame]) – Meta file containing the node types. Only needed if node_metric is ‘STC’. Defaults to None.

  • test_type (str) – Test type to compare continuous variables across contexts for the ‘STC’ node metric. Defaults to ‘nonparametric’.

  • nan_value (Optional[int]) – Numerical value used for NaN values in the context data. If None, an error will be raised if such values are present. Defaults to None.

  • path (Optional[str]) – Optional path to save the differential node scores as a CSV file. Defaults to None.

Return type:

DataFrame

Returns:

A DataFrame containing the computed differential node scores.

modina.compute_ranking(nodes_diff, edges_diff, ranking_alg, path=None, meta_file=None)[source]

Compute a ranking based on the specified ranking algorithm.

Parameters:
  • nodes_diff (Optional[DataFrame]) – Differential node scores.

  • edges_diff (Optional[DataFrame]) – Differential edge scores.

  • ranking_alg (str) – Ranking algorithm to compute. Options are ‘PageRank+’, ‘PageRank’, ‘absDimontRank’, ‘DimontRank’, ‘direct_node’ and ‘direct_edge’.

  • meta_file (Optional[DataFrame]) – Metadata file containing a ‘label’ and ‘type’ column to specify the data type of each variable.

  • path (Optional[str]) – Optional path to save the ranking as a CSV file.

Return type:

Tuple[list, dict]

Returns:

A tuple containing the list of ranked nodes and a dictionary with ranked nodes per data type.

modina.diffnet_analysis(context1, context2, meta_file, edge_metric=None, node_metric=None, ranking_alg='PageRank+', filter_method=None, filter_param=0.0, filter_metric=None, filter_rule=None, max_path_length=2, test_type='nonparametric', nan_value=None, correction='bh', num_workers=1, project_path=None, name1='context1', name2='context2')[source]

Wrapper function to perform an end-to-end differential network analysis following the moDiNA pipeline.

Parameters:
  • context1 (DataFrame) – Observed data of Context 1 (rows: samples, columns: variables).

  • context2 (DataFrame) – Observed data of Context 2 (rows: samples, columns: variables).

  • meta_file (DataFrame) – Metadata file containing a ‘label’ and ‘type’ column to specify the data type of each variable.

  • test_type (str) – Type of statistical tests to use for association score calculation. Defaults to ‘nonparametric’.

  • nan_value (Optional[int]) – Numerical value used for NaN values in the context data. If None, an error will be raised if such values are present. Defaults to None.

  • correction (str) – Correction method for multiple testing. Defaults to ‘bh’.

  • num_workers (int) – Number of workers for parallel processing. Defaults to 1.

  • filter_method (Optional[str]) – Method used for filtering. Defaults to None.

  • filter_param (float) – Parameter for the specified filtering method. Defaults to 0.0.

  • filter_metric (Optional[str]) – Edge metric used for filtering. Defaults to None.

  • filter_rule (Optional[str]) – Rule to integrate the networks during filtering. Defaults to None.

  • edge_metric (Optional[str]) – Edge metric used to construct the differential network.

  • node_metric (Optional[str]) – Node metric used to construct the differential network.

  • max_path_length (int) – Maximum length of paths to consider in the computation of integrated interaction scores. Defaults to 2.

  • ranking_alg (str) – Ranking algorithm to compute. Options are ‘PageRank+’, ‘PageRank’, ‘absDimontRank’, ‘DimontRank’, ‘direct_node’ and ‘direct_edge’. Defaults to ‘PageRank+’.

  • name1 (str) – Name of Context 1. Used for saving files. Defaults to ‘context1’.

  • name2 (str) – Name of Context 2. Used for saving files. Defaults to ‘context2’.

  • project_path (Optional[str]) – Optional path to save results. Defaults to None.

Return type:

Tuple[list, dict, Optional[DataFrame], Optional[DataFrame], dict]

Returns:

A tuple (ranking, edges_diff, nodes_diff, config) containing the computed ranking, differential edges, differential nodes, and configuration parameters.

modina.filter(scores1, scores2, context1, context2, filter_method=None, filter_param=0.0, filter_metric=None, filter_rule=None, path=None)[source]

Filter association scores and context data based on the specified filtering configurations.

Parameters:
  • scores1 (DataFrame) – Statistical association scores of Context 1.

  • scores2 (DataFrame) – Statistical association scores of Context 2.

  • context1 (DataFrame) – The first context for the differential network analysis.

  • context2 (DataFrame) – The second context for the differential network analysis.

  • filter_method (Optional[str]) – Method used for filtering. Defaults to None.

  • filter_param (float) – Parameter for the specified filtering method. Defaults to 0.0.

  • filter_metric (Optional[str]) – Edge metric used for filtering. Defaults to None.

  • filter_rule (Optional[str]) – Rule to integrate the networks during filtering. Defaults to None.

  • path (Optional[str]) – Optional path to save the filtered scores and context data as CSV files. Defaults to None.

Return type:

Tuple[DataFrame, DataFrame, DataFrame, DataFrame]

Returns:

A tuple containing the filtered scores and context data.

modina.post_rescaling(diff_scores, metric)[source]
modina.pre_rescaling(scores1, scores2, metric)[source]
modina.save_gt(groundtruths, path, mode='node')[source]
modina.simulate_copula(path=None, name1='context1', name2='context2', n_bi=50, n_cont=50, n_cat=50, n_samples=500, n_shift_cont=0, n_shift_bi=0, n_shift_cat=0, n_corr_cont_cont=0, n_corr_bi_bi=0, n_corr_cat_cat=0, n_corr_bi_cont=0, n_corr_bi_cat=0, n_corr_cont_cat=0, n_both_cont_cont=0, n_both_bi_bi=0, n_both_cat_cat=0, n_both_bi_cont=0, n_both_bi_cat=0, n_both_cont_cat=0, shift=0.5, corr=0.7)[source]

Simulate two contexts with binary and continuous nodes using a Gaussian copula.

Parameters:
  • path – Path to save the simulated contexts, the meta file and the ground truth information. If None, files are not saved.

  • name1 – Name of the first context.

  • name2 – Name of the second context.

  • n_bi – Number of binary nodes to simulate.

  • n_cont – Number of continuous nodes to simulate.

  • n_cat – Number of categorical nodes to simulate.

  • n_samples – Number of samples per context.

  • n_shift_cont – Number of continuous nodes with an artificially introduced mean shift.

  • n_shift_bi – Number of binary nodes with an artificially introduced mean shift.

  • n_shift_cat – Number of categorical nodes with an artificially introduced mean shift.

  • n_corr_cont_cont – Number of continuous node pairs with an artifically introduced correlation difference.

  • n_corr_bi_bi – Number of binary node pairs with an artificially introduced correlation difference.

  • n_corr_cat_cat – Number of categorical node pairs with an artificially introduced correlation difference.

  • n_corr_bi_cat – Number of binary-categorical node pairs with an artificially introduced correlation difference.

  • n_corr_cont_cat – Number of continuous-categorical node pairs with an artificially introduced correlation difference.

  • n_corr_bi_cont – Number of mixed node pairs with an artificially introduced correlation difference.

  • n_both_cont_cont – Number of continuous node pairs with both an aritificially introduced mean shift and correlation difference.

  • n_both_bi_bi – Number of binary node pairs with both an artificially introduced mean shift and correlation difference.

  • n_both_cat_cat – Number of categorical node pairs with both an artificially introduced mean shift and correlation difference.

  • n_both_bi_cat – Number of binary-categorical node pairs with both an artificially introduced mean shift and correlation difference.

  • n_both_cont_cat – Number of continuous-categorical node pairs with both an artificially introduced mean shift and correlation difference.

  • n_both_bi_cont – Number of mixed node pairs with both an artificially introduced mean shift and correlation difference.

  • shift – Magnitude of the mean shift.

  • corr – Magnitude of the correlation difference (measured as correlation coefficient between 0 and 1).

Returns:

A tuple containing the two simulated contexts, a meta file and a list of ground truth nodes. - context1: pd.DataFrame of the first simulated context. - context2: pd.DataFrame of the second simulated context. - meta: pd.DataFrame containing the data type for each simulated variable. - ground_truth: A tuple containing three lists of ground truth nodes: (shift_nodes, corr_nodes, shift_corr_nodes).