opengsl.data.DataSet

class opengsl.data.Dataset(data, feat_norm=False, verbose=True, n_splits=1, split='public', split_params=None, homophily_control=None, path='./data/', cv=None, **kwargs)[source]

Bases: object

# TODO update docstring Dataset Class. This class loads, preprocesses and splits various datasets.

Parameters
  • data (str) – The name of dataset.

  • feat_norm (bool) – Whether to normalize the features.

  • verbose (bool) – Whether to print statistics.

  • n_splits (int) – Number of data splits.

  • homophily_control (float) – The homophily ratio control homophily receives. If set to None, the original adj will be kept unchanged.

  • path (str) – Path to save dataset files.

prepare_data(ds_name, feat_norm=False, verbose=True)[source]

Function to Load various datasets. Homophilous datasets are loaded via pyg, while heterophilous datasets are loaded with hetero_load. The results are saved as self.feats, self.adj, self.labels, self.train_masks, self.val_masks, self.test_masks. Noth that self.adj is undirected and has no self loops.

Parameters
  • ds_name (str) – The name of dataset.

  • feat_norm (bool) – Whether to normalize the features.

  • verbose (bool) – Whether to print statistics.

split_data(split, n_splits=1, cv=None, split_params=None, verbose=True)[source]

Function to conduct data splitting for various datasets.

Parameters
  • n_splits (int) – Number of data splits.

  • verbose (bool) – Whether to print statistics.

split_graphs(split, n_splits, cv, split_params, verbose=True)[source]

Function to conduct data splitting for graph-level datasets.

Parameters
  • n_splits (int) – Number of data splits.

  • verbose (bool) – Whether to print statistics.