Data Process
- fedgraph.data_process.GC_rand_split_chunk(graphs: list, num_trainer: int = 10, overlap: bool = False, seed: int = 42) list [source]
Randomly split graphs into chunks for each trainer.
- fedgraph.data_process.NC_load_data(dataset_str: str) tuple [source]
Loads input data from ‘gcn/data’ directory and processes these datasets into a format suitable for training GCN and similar models.
- Parameters:
dataset_str (Name of the dataset to be loaded.)
- Returns:
features (torch.Tensor) – Node feature matrix as a float tensor.
adj (torch.Tensor or torch_sparse.tensor.SparseTensor) – Adjacency matrix of the graph.
labels (torch.Tensor) – Labels of the nodes.
idx_train (torch.LongTensor) – Indices of training nodes.
idx_val (torch.LongTensor) – Indices of validation nodes.
idx_test (torch.LongTensor) – Indices of test nodes.
Note
ind.dataset_str.x => the feature vectors of the training instances as scipy.sparse.csr.csr_matrix object; ind.dataset_str.tx => the feature vectors of the test instances as scipy.sparse.csr.csr_matrix object; ind.dataset_str.allx => the feature vectors of both labeled and unlabeled training instances (a superset of ind.dataset_str.x) as scipy.sparse.csr.csr_matrix object; ind.dataset_str.y => the one-hot labels of the labeled training instances as numpy.ndarray object; ind.dataset_str.ty => the one-hot labels of the test instances as numpy.ndarray object; ind.dataset_str.ally => the labels for instances in ind.dataset_str.allx as numpy.ndarray object; ind.dataset_str.graph => a dict in the format {index: [index_of_neighbor_nodes]} as collections.defaultdict object; ind.dataset_str.test.index => the indices of test instances in graph, for the inductive setting as list object.
All objects above must be saved using python pickle module.
- fedgraph.data_process.NC_parse_index_file(filename: str) list [source]
Reads and parses an index file
- fedgraph.data_process.data_loader(args: AttriDict) Any [source]
Load data for federated learning tasks.
- Parameters:
args (attridict) – The configuration of the task.
- Returns:
data – The data for the task.
- Return type:
Any
Note
The function will call the corresponding data loader function based on the task. If the task is “NC”, the function will call data_loader_NC. If the task is “GC”, the function will call data_loader_GC. If the task is “LP”, only the country code needs to be specified at this stage, and the function will return None.
- fedgraph.data_process.data_loader_GC(args: AttriDict) dict [source]
Load data for graph classification tasks.
- Parameters:
args (attridict) – The configuration of the task.
- Returns:
data – The data for the task.
- Return type:
- fedgraph.data_process.data_loader_GC_multiple(datapath: str, dataset_group: str = 'small', batch_size: int = 32, convert_x: bool = False, seed: int = 42) dict [source]
Graph Classification: prepare data for a group of datasets to multiple trainers.
- Parameters:
- Returns:
splited_data – The data for each trainer.
- Return type:
- fedgraph.data_process.data_loader_GC_single(datapath: str, dataset: str = 'PROTEINS', num_trainer: int = 10, batch_size: int = 128, convert_x: bool = False, seed: int = 42, overlap: bool = False) dict [source]
Graph Classification: prepare data for one dataset to multiple trainers.
- Parameters:
datapath (str) – The input path of data.
dataset (str) – The name of dataset that should be available in the TUDataset.
num_trainer (int) – The number of trainers.
batch_size (int) – The batch size for graph classification.
convert_x (bool) – Whether to convert node features to one-hot degree.
seed (int) – Seed for randomness.
overlap (bool) – Whether trainers have overlapped data.
- Returns:
splited_data – The data for each trainer.
- Return type: