region.util module

exception region.util.MissingMetric

Bases: RuntimeError

Raised when a distance metric is required but was not set.

region.util.Move

alias of move

region.util.all_elements_equal(array)
region.util.array_from_df_col(df, attr)

Extract one or more columns from a DataFrame as numpy array.

Parameters:
  • df (Union[DataFrame, GeoDataFrame]) –
  • attr (Union[str, Sequence[str]]) – The columns’ names to extract.
Returns:

col – The specified column(s) of the array.

Return type:

numpy.ndarray

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({"col1": [1, 2, 3],
...                    "col2": [7, 8, 9]})
>>> (array_from_df_col(df, "col1") == np.array([[1],
...                                         [2],
...                                         [3]])).all()
True
>>> (array_from_df_col(df, ["col1"]) == np.array([[1],
...                                           [2],
...                                           [3]])).all()
True
>>> (array_from_df_col(df, ["col1", "col2"]) == np.array([[1, 7],
...                                                   [2, 8],
...                                                   [3, 9]])).all()
True
region.util.array_from_dict_values(dct, sorted_keys=None, flat_output=False, dtype=<class 'float'>)

Return values of the dictionary passed as dct argument as an numpy array. The values in the returned array are sorted by the keys of dct.

Parameters:
  • dct (dict) –
  • sorted_keys (iterable, optional) – If passed, then the elements of the returned array will be sorted by this argument. Thus, this argument can be passed to suppress the sorting, or for getting a subset of the dictionary’s values or to get repeated values.
  • flat_output (bool, default: False) – If True, the returned array will be one-dimensional. If False, the returned array will be two-dimensional with one row per key in dct.
  • dtype (default: np.float64) – The dtype of the returned array.
Returns:

array

Return type:

numpy.ndarray

Examples

>>> dict_flat = {0: 0, 1: 10}
>>> dict_it = {0: [0], 1: [10]}
>>> desired_flat = np.array([0, 10])
>>> desired_2d = np.array([[0],
...                        [10]])
>>> flat_flat = array_from_dict_values(dict_flat, flat_output=True)
>>> (flat_flat == desired_flat).all()
True
>>> flat_2d = array_from_dict_values(dict_flat)
>>> (flat_2d == desired_2d).all()
True
>>> it_flat = array_from_dict_values(dict_it, flat_output=True)
>>> (it_flat == desired_flat).all()
True
>>> it_2d = array_from_dict_values(dict_it)
>>> (it_2d == desired_2d).all()
True
region.util.array_from_graph(graph, attr)
Parameters:
  • graph (networkx.Graph) –
  • attr (str or iterable) – If str, then it specifies the an attribute of the graph’s nodes. If iterable of strings, then multiple attributes of the graph’s nodes are specified.
Returns:

array – Array with one row for each node in graph.

Return type:

numpy.ndarray

Examples

>>> import networkx as nx
>>> edges = [(0, 1), (1, 2),          # 0 | 1 | 2
...          (0, 3), (1, 4), (2, 5),  # ---------
...          (3, 4), (4,5)]           # 3 | 4 | 5
>>> graph = nx.Graph(edges)
>>> data_dict = {node: 10*node for node in graph}
>>> nx.set_node_attributes(graph, "test_data", data_dict)
>>> desired = np.array([[0],
...                     [10],
...                     [20],
...                     [30],
...                     [40],
...                     [50]])
>>> (array_from_graph(graph, "test_data") == desired).all()
True
>>> (array_from_graph(graph, ["test_data"]) == desired).all()
True
>>> (array_from_graph(graph, ["test_data", "test_data"]) ==
...  np.hstack((desired, desired))).all()
True
region.util.array_from_graph_or_dict(graph, attr)
region.util.array_from_region_list(region_list)
Parameters:region_list (list) – Each list element is an iterable of a region’s areas.
Returns:labels – Each element specifies the region of the corresponding area.
Return type:numpy.ndarray

Examples

>>> import numpy as np
>>> obtained = array_from_region_list([{0, 1, 2, 5}, {3, 4}])
>>> desired = np.array([ 0, 0, 0, 1, 1, 0])
>>> (obtained == desired).all()
True
region.util.assert_feasible(solution, adj, n_regions=None)
Parameters:
  • solution (numpy.ndarray) – Array of region labels.
  • adj (scipy.sparse.csr_matrix) – Adjacency matrix representing the contiguity relation.
  • n_regions (int or None) – An int represents the desired number of regions. If None, then the number of regions is not checked.
Raises:

exc : `ValueError` – A ValueError is raised if clustering is not spatially contiguous. Given the n_regions argument is not None, a ValueError is raised also if the number of regions is not equal to the n_regions argument.

region.util.check_solver(solver)
region.util.copy_func(f)

Return a copy of a function. This is useful e.g. to create aliases (whose docstrings can be changed without affecting the original function). The implementation is taken from https://stackoverflow.com/a/13503277.

Parameters:f (function) –
Returns:g – Copy of f.
Return type:function
region.util.count(arr, el)
Parameters:
Returns:

result – The number of occurences of el in arr.

Return type:

numpy.ndarray

Examples

>>> arr = np.array([0, 0, 0, 1, 1])
>>> count(arr, 0)
3
>>> count(arr, 1)
2
>>> count(arr, 2)
0
region.util.dataframe_to_dict(df, cols)
Parameters:
  • df (Union[pandas.DataFrame, geopandas.GeoDataFrame]) –
  • cols (Union[str, list]) – If str, then it is the name of a column of df. If list, then it is a list of strings. Each string is the name of a column of df.
Returns:

result – The keys are the elements of the DataFrame’s index. Each value is a numpy.ndarray holding the corresponding values in the columns specified by cols.

Return type:

dict

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({"data": [100, 120, 115]})
>>> result = dataframe_to_dict(df, "data")
>>> result == {0: 100, 1: 120, 2: 115}
True
>>> import numpy as np
>>> df = pd.DataFrame({"data": [100, 120],
...                    "other": [1, 2]})
>>> actual = dataframe_to_dict(df, ["data", "other"])
>>> desired = {0: np.array([100, 1]), 1: np.array([120, 2])}
>>> all(np.array_equal(actual[i], desired[i]) for i in desired)
True
region.util.dict_from_graph_attr(graph, attr, array_values=False)
Parameters:
  • graph (networkx.Graph) –
  • attr (str, iterable, or dict) – If str, then it specifies the an attribute of the graph’s nodes. If iterable of strings, then multiple attributes of the graph’s nodes are specified. If dict, then each key is a node and each value the corresponding attribute value. (This format is also this function’s return format.)
  • array_values (bool, default: False) – If True, then each value is transformed into a numpy.ndarray.
Returns:

result_dict – Each key is a node in the graph. If array_values is False, then each value is a list of attribute values corresponding to the key node. If array_values is True, then each value this list of attribute values is turned into a numpy.ndarray. That requires the values to be shape-compatible for stacking.

Return type:

dict

Examples

>>> import networkx as nx
>>> edges = [(0, 1), (1, 2),          # 0 | 1 | 2
...          (0, 3), (1, 4), (2, 5),  # ---------
...          (3, 4), (4,5)]           # 3 | 4 | 5
>>> graph = nx.Graph(edges)
>>> data_dict = {node: 10*node for node in graph}
>>> nx.set_node_attributes(graph, "test_data", data_dict)
>>> desired = {key: [value] for key, value in data_dict.items()}
>>> dict_from_graph_attr(graph, "test_data") == desired
True
>>> dict_from_graph_attr(graph, ["test_data"]) == desired
True
region.util.distribute_regions_among_components(component_labels, n_regions)
Parameters:
  • component_labels (list) –

    Each element specifies to which connected component an area belongs. An example would be [0, 0, 1, 0, 0, 1] for the following two islands:

    island one        island two
    .-------.         .---.
    | 0 | 1 |         | 2 |
    | - - - |         | - |
    | 3 | 4 |         | 5 |
    `-------´         `---´
    
  • n_regions (int) –
Returns:

result_dict – Each key is a label of a connected component. Each value specifies into how many regions the component is to be clustered.

Return type:

Dict[int, int]

region.util.find_sublist_containing(el, lst, index=False)
Parameters:
  • el – The element to search for in the sublists of lst.
  • lst (collections.Sequence) – A sequence of sequences or sets.
  • index (bool, default: False) – If False (default), the subsequence or subset containing el is returned. If True, the index of the subsequence or subset in lst is returned.
Returns:

result – See the index argument for more information.

Return type:

collections.Sequence, collections.Set or int

Raises:

exc : LookupError – If el is not in any of the elements of lst.

Examples

>>> lst = [{0, 1}, {2}]
>>> find_sublist_containing(0, lst, index=False) == {0, 1}
True
>>> find_sublist_containing(0, lst, index=True) == 0
True
>>> find_sublist_containing(2, lst, index=False) == {2}
True
>>> find_sublist_containing(2, lst, index=True) == 1
True
region.util.generate_initial_sol(adj, n_regions)

Generate a random initial clustering.

Parameters:
Yields:

region_labels (numpy.ndarray) – An array with -1 for areas which are not part of the yielded component and an integer >= 0 specifying the region of areas within the yielded component.

region.util.get_metric_function(metric=None)
Parameters:metric (str or function or None, default: None) –

Using None is equivalent to using “euclidean”.

If str, then this string specifies the distance metric (from scikit-learn) to use for calculating the objective function. Possible values are:

  • ”cityblock” for sklearn.metrics.pairwise.manhattan_distances
  • ”cosine” for sklearn.metrics.pairwise.cosine_distances
  • ”euclidean” for sklearn.metrics.pairwise.euclidean_distances
  • ”l1” for sklearn.metrics.pairwise.manhattan_distances
  • ”l2” for sklearn.metrics.pairwise.euclidean_distances
  • ”manhattan” for sklearn.metrics.pairwise.manhattan_distances

If function, then this function should take two arguments and return a scalar value. Furthermore, the following conditions must be fulfilled:

  1. d(a, b) >= 0, for all a and b
  2. d(a, b) == 0, if and only if a = b, positive definiteness
  3. d(a, b) == d(b, a), symmetry
  4. d(a, c) <= d(a, b) + d(b, c), the triangle inequality
Returns:metric_func – If the metric argument is a function, it is returned. If the metric argument is a string, then the corresponding distance metric function from sklearn.metrics.pairwise is returned.
Return type:function
region.util.get_solver_instance(solver_string)
region.util.make_move(moving_area, new_label, labels)

Modify the labels argument in place (no return value!) such that the area moving_area has the new region label new_label.

Parameters:
  • moving_area – The area to be moved (assigned to a new region).
  • new_label (int) – The new region label of area moving_area.
  • labels (numpy.ndarray) – Each element is a region label of the area corresponding array index.

Examples

>>> import numpy as np
>>> labels = np.array([0, 0, 0, 0, 1, 1])
>>> make_move(3, 1, labels)
>>> (labels == np.array([0, 0, 0, 1, 1, 1])).all()
True
region.util.pop_randomly_from(lst)
region.util.raise_distance_metric_not_set(x, y)
region.util.random_element_from(lst)
region.util.scipy_sparse_matrix_from_dict(neighbors)
Parameters:neighbors (dict) – Each key represents an area. The corresponding value contains the area’s neighbors.
Returns:adj – Adjacency matrix representing the areas’ contiguity relation.
Return type:scipy.sparse.csr_matrix

Examples

>>> neighbors = {0: {1, 3}, 1: {0, 2, 4}, 2: {1, 5},
...              3: {0, 4}, 4: {1, 3, 5}, 5: {2, 4}}
>>> obtained = scipy_sparse_matrix_from_dict(neighbors)
>>> desired = np.array([[0, 1, 0, 1, 0, 0],
...                     [1, 0, 1, 0, 1, 0],
...                     [0, 1, 0, 0, 0, 1],
...                     [1, 0, 0, 0, 1, 0],
...                     [0, 1, 0, 1, 0, 1],
...                     [0, 0, 1, 0, 1, 0]])
>>> (obtained.todense() == desired).all()
True
>>> neighbors = {"left": {"middle"},
...              "middle": {"left", "right"},
...              "right": {"middle"}}
>>> obtained = scipy_sparse_matrix_from_dict(neighbors)
>>> desired = np.array([[0, 1, 0],
...                     [1, 0, 1],
...                     [0, 1, 0]])
>>> (obtained.todense() == desired).all()
True
region.util.scipy_sparse_matrix_from_w(w)
Parameters:w (libpysal.weights.weights.W) – A W object representing the areas’ contiguity relation.
Returns:adj – Adjacency matrix representing the areas’ contiguity relation.
Return type:scipy.sparse.csr_matrix

Examples

>>> import libpysal as ps
>>> neighbor_dict = {0: {1}, 1: {0, 2}, 2: {1}}
>>> w = ps.weights.W(neighbor_dict)
>>> obtained = scipy_sparse_matrix_from_w(w)
>>> desired = np.array([[0, 1, 0],
...                     [1, 0, 1],
...                     [0, 1, 0]])
>>> (obtained.todense() == desired).all()
True
region.util.separate_components(adj, labels)

Take a labels array and yield modifications of it (one modified array per connected component). The modified array will be unchanged at those indices belonging to the current connected component. Thus it will have integers >= 0 there. At all other indices the Yielded array will be -1.

Parameters:
Yields:

comp_dict (numpy.ndarray) – Each yielded dict represents one connected component of the graph specified by the adj argument. In a yielded dict, each key is an area and each value is the corresponding region-ID.

Examples

>>> edges_island1 = [(0, 1), (1, 2),          # 0 | 1 | 2
...                  (0, 3), (1, 4), (2, 5),  # ---------
...                  (3, 4), (4,5)]           # 3 | 4 | 5
>>>
>>> edges_island2 = [(6, 7),                  # 6 | 7
...                  (6, 8), (7, 9),          # -----
...                  (8, 9)]                  # 8 | 9
>>>
>>> graph = nx.Graph(edges_island1 + edges_island2)
>>> adj = nx.to_scipy_sparse_matrix(graph)
>>>
>>> # island 1: island divided into regions 0, 1, and 2
>>> sol_island1 = [area%3 for area in range(6)]
>>> # island 2: all areas are in region 3
>>> sol_island2 = [3 for area in range(6, 10)]
>>> labels = np.array(sol_island1 + sol_island2)
>>>
>>> yielded = list(separate_components(adj, labels))
>>> yielded.sort(key=lambda arr: arr[0], reverse=True)
>>> (yielded[0] == np.array([0, 1, 2, 0, 1, 2, -1, -1, -1, -1])).all()
True
>>> (yielded[1] == np.array([-1, -1, -1, -1, -1, -1, 3, 3, 3, 3])).all()
True
region.util.w_from_gdf(gdf, contiguity)

Get a W object from a GeoDataFrame.

Parameters:
  • gdf (GeoDataFrame) –
  • contiguity ({"rook", "queen"}) –
Returns:

weights – The contiguity information contained in the gdf argument in the form of a W object.

Return type:

W