Data Process#

fedgraph.data_process.load_data(dataset_str: str) tuple[source]#

Loads input data from ‘gcn/data’ directory and processes these datasets into a format suitable for training GCN and similar models.

Parameters:

dataset_str (Name of the dataset to be loaded.) –

Returns:

  • features (torch.Tensor) – Node feature matrix as a float tensor.

  • adj (torch.Tensor or torch_sparse.tensor.SparseTensor) – Adjacency matrix of the graph.

  • labels (torch.Tensor) – Labels of the nodes.

  • idx_train (torch.LongTensor) – Indices of training nodes.

  • idx_val (torch.LongTensor) – Indices of validation nodes.

  • idx_test (torch.LongTensor) – Indices of test nodes.

Note

ind.dataset_str.x => the feature vectors of the training instances as scipy.sparse.csr.csr_matrix object; ind.dataset_str.tx => the feature vectors of the test instances as scipy.sparse.csr.csr_matrix object; ind.dataset_str.allx => the feature vectors of both labeled and unlabeled training instances (a superset of ind.dataset_str.x) as scipy.sparse.csr.csr_matrix object; ind.dataset_str.y => the one-hot labels of the labeled training instances as numpy.ndarray object; ind.dataset_str.ty => the one-hot labels of the test instances as numpy.ndarray object; ind.dataset_str.ally => the labels for instances in ind.dataset_str.allx as numpy.ndarray object; ind.dataset_str.graph => a dict in the format {index: [index_of_neighbor_nodes]} as collections.defaultdict object; ind.dataset_str.test.index => the indices of test instances in graph, for the inductive setting as list object.

All objects above must be saved using python pickle module.

fedgraph.data_process.normalize(mx: csc_matrix) csr_matrix[source]#

This function is to row-normalize sparse matrix for efficient computation of the graph

Parameters:

mx (sparse matrix) – Input sparse matrix to row-normalize.

Returns:

mx – Row-normalized sparse matrix.

Return type:

sparse matrix

Note

Row-normalizing is usually done in graph algorithms to enable equal node contributions regardless of the node’s degree and to stabilize, ease numerical computations.

fedgraph.data_process.parse_index_file(filename: str) list[source]#

Reads and parses an index file

Parameters:

filename (str) – Name or path of the file to parse.

Returns:

index – List of integers, each integer in the list represents int of the lines of the input file.

Return type:

list