region.util module¶
-
exception
region.util.
MissingMetric
¶ Bases:
RuntimeError
Raised when a distance metric is required but was not set.
-
region.util.
Move
¶ alias of
move
-
region.util.
all_elements_equal
(array)¶
-
region.util.
array_from_df_col
(df, attr)¶ Extract one or more columns from a DataFrame as numpy array.
Parameters: Returns: col – The specified column(s) of the array.
Return type: Examples
>>> import pandas as pd >>> df = pd.DataFrame({"col1": [1, 2, 3], ... "col2": [7, 8, 9]}) >>> (array_from_df_col(df, "col1") == np.array([[1], ... [2], ... [3]])).all() True >>> (array_from_df_col(df, ["col1"]) == np.array([[1], ... [2], ... [3]])).all() True >>> (array_from_df_col(df, ["col1", "col2"]) == np.array([[1, 7], ... [2, 8], ... [3, 9]])).all() True
-
region.util.
array_from_dict_values
(dct, sorted_keys=None, flat_output=False, dtype=<class 'float'>)¶ Return values of the dictionary passed as dct argument as an numpy array. The values in the returned array are sorted by the keys of dct.
Parameters: - dct (dict) –
- sorted_keys (iterable, optional) – If passed, then the elements of the returned array will be sorted by this argument. Thus, this argument can be passed to suppress the sorting, or for getting a subset of the dictionary’s values or to get repeated values.
- flat_output (bool, default: False) – If True, the returned array will be one-dimensional. If False, the returned array will be two-dimensional with one row per key in dct.
- dtype (default: np.float64) – The dtype of the returned array.
Returns: array
Return type: Examples
>>> dict_flat = {0: 0, 1: 10} >>> dict_it = {0: [0], 1: [10]} >>> desired_flat = np.array([0, 10]) >>> desired_2d = np.array([[0], ... [10]]) >>> flat_flat = array_from_dict_values(dict_flat, flat_output=True) >>> (flat_flat == desired_flat).all() True >>> flat_2d = array_from_dict_values(dict_flat) >>> (flat_2d == desired_2d).all() True >>> it_flat = array_from_dict_values(dict_it, flat_output=True) >>> (it_flat == desired_flat).all() True >>> it_2d = array_from_dict_values(dict_it) >>> (it_2d == desired_2d).all() True
-
region.util.
array_from_graph
(graph, attr)¶ Parameters: - graph (networkx.Graph) –
- attr (str or iterable) – If str, then it specifies the an attribute of the graph’s nodes. If iterable of strings, then multiple attributes of the graph’s nodes are specified.
Returns: array – Array with one row for each node in graph.
Return type: Examples
>>> import networkx as nx >>> edges = [(0, 1), (1, 2), # 0 | 1 | 2 ... (0, 3), (1, 4), (2, 5), # --------- ... (3, 4), (4,5)] # 3 | 4 | 5 >>> graph = nx.Graph(edges) >>> data_dict = {node: 10*node for node in graph} >>> nx.set_node_attributes(graph, "test_data", data_dict) >>> desired = np.array([[0], ... [10], ... [20], ... [30], ... [40], ... [50]]) >>> (array_from_graph(graph, "test_data") == desired).all() True >>> (array_from_graph(graph, ["test_data"]) == desired).all() True >>> (array_from_graph(graph, ["test_data", "test_data"]) == ... np.hstack((desired, desired))).all() True
-
region.util.
array_from_graph_or_dict
(graph, attr)¶
-
region.util.
array_from_region_list
(region_list)¶ Parameters: region_list (list) – Each list element is an iterable of a region’s areas. Returns: labels – Each element specifies the region of the corresponding area. Return type: numpy.ndarray
Examples
>>> import numpy as np >>> obtained = array_from_region_list([{0, 1, 2, 5}, {3, 4}]) >>> desired = np.array([ 0, 0, 0, 1, 1, 0]) >>> (obtained == desired).all() True
-
region.util.
assert_feasible
(solution, adj, n_regions=None)¶ Parameters: - solution (
numpy.ndarray
) – Array of region labels. - adj (
scipy.sparse.csr_matrix
) – Adjacency matrix representing the contiguity relation. - n_regions (int or None) – An int represents the desired number of regions. If None, then the number of regions is not checked.
Raises: exc : `ValueError` – A ValueError is raised if clustering is not spatially contiguous. Given the n_regions argument is not None, a ValueError is raised also if the number of regions is not equal to the n_regions argument.
- solution (
-
region.util.
check_solver
(solver)¶
-
region.util.
copy_func
(f)¶ Return a copy of a function. This is useful e.g. to create aliases (whose docstrings can be changed without affecting the original function). The implementation is taken from https://stackoverflow.com/a/13503277.
Parameters: f (function) – Returns: g – Copy of f. Return type: function
-
region.util.
count
(arr, el)¶ Parameters: - arr (
numpy.ndarray
) – - el (object) –
Returns: result – The number of occurences of el in arr.
Return type: Examples
>>> arr = np.array([0, 0, 0, 1, 1]) >>> count(arr, 0) 3 >>> count(arr, 1) 2 >>> count(arr, 2) 0
- arr (
-
region.util.
dataframe_to_dict
(df, cols)¶ Parameters: - df (Union[
pandas.DataFrame
,geopandas.GeoDataFrame
]) – - cols (Union[str, list]) – If str, then it is the name of a column of df. If list, then it is a list of strings. Each string is the name of a column of df.
Returns: result – The keys are the elements of the DataFrame’s index. Each value is a
numpy.ndarray
holding the corresponding values in the columns specified by cols.Return type: Examples
>>> import pandas as pd >>> df = pd.DataFrame({"data": [100, 120, 115]}) >>> result = dataframe_to_dict(df, "data") >>> result == {0: 100, 1: 120, 2: 115} True >>> import numpy as np >>> df = pd.DataFrame({"data": [100, 120], ... "other": [1, 2]}) >>> actual = dataframe_to_dict(df, ["data", "other"]) >>> desired = {0: np.array([100, 1]), 1: np.array([120, 2])} >>> all(np.array_equal(actual[i], desired[i]) for i in desired) True
- df (Union[
-
region.util.
dict_from_graph_attr
(graph, attr, array_values=False)¶ Parameters: - graph (networkx.Graph) –
- attr (str, iterable, or dict) – If str, then it specifies the an attribute of the graph’s nodes. If iterable of strings, then multiple attributes of the graph’s nodes are specified. If dict, then each key is a node and each value the corresponding attribute value. (This format is also this function’s return format.)
- array_values (bool, default: False) – If True, then each value is transformed into a
numpy.ndarray
.
Returns: result_dict – Each key is a node in the graph. If array_values is False, then each value is a list of attribute values corresponding to the key node. If array_values is True, then each value this list of attribute values is turned into a
numpy.ndarray
. That requires the values to be shape-compatible for stacking.Return type: Examples
>>> import networkx as nx >>> edges = [(0, 1), (1, 2), # 0 | 1 | 2 ... (0, 3), (1, 4), (2, 5), # --------- ... (3, 4), (4,5)] # 3 | 4 | 5 >>> graph = nx.Graph(edges) >>> data_dict = {node: 10*node for node in graph} >>> nx.set_node_attributes(graph, "test_data", data_dict) >>> desired = {key: [value] for key, value in data_dict.items()} >>> dict_from_graph_attr(graph, "test_data") == desired True >>> dict_from_graph_attr(graph, ["test_data"]) == desired True
-
region.util.
distribute_regions_among_components
(component_labels, n_regions)¶ Parameters: Returns: result_dict – Each key is a label of a connected component. Each value specifies into how many regions the component is to be clustered.
Return type:
-
region.util.
find_sublist_containing
(el, lst, index=False)¶ Parameters: - el – The element to search for in the sublists of lst.
- lst (collections.Sequence) – A sequence of sequences or sets.
- index (bool, default: False) – If False (default), the subsequence or subset containing el is returned. If True, the index of the subsequence or subset in lst is returned.
Returns: result – See the index argument for more information.
Return type: collections.Sequence, collections.Set or int
Raises: exc : LookupError – If el is not in any of the elements of lst.
Examples
>>> lst = [{0, 1}, {2}] >>> find_sublist_containing(0, lst, index=False) == {0, 1} True >>> find_sublist_containing(0, lst, index=True) == 0 True >>> find_sublist_containing(2, lst, index=False) == {2} True >>> find_sublist_containing(2, lst, index=True) == 1 True
-
region.util.
generate_initial_sol
(adj, n_regions)¶ Generate a random initial clustering.
Parameters: - adj (
scipy.sparse.csr_matrix
) – - n_regions (int) –
Yields: region_labels (
numpy.ndarray
) – An array with -1 for areas which are not part of the yielded component and an integer >= 0 specifying the region of areas within the yielded component.- adj (
-
region.util.
get_metric_function
(metric=None)¶ Parameters: metric (str or function or None, default: None) – Using None is equivalent to using “euclidean”.
If str, then this string specifies the distance metric (from scikit-learn) to use for calculating the objective function. Possible values are:
- ”cityblock” for sklearn.metrics.pairwise.manhattan_distances
- ”cosine” for sklearn.metrics.pairwise.cosine_distances
- ”euclidean” for sklearn.metrics.pairwise.euclidean_distances
- ”l1” for sklearn.metrics.pairwise.manhattan_distances
- ”l2” for sklearn.metrics.pairwise.euclidean_distances
- ”manhattan” for sklearn.metrics.pairwise.manhattan_distances
If function, then this function should take two arguments and return a scalar value. Furthermore, the following conditions must be fulfilled:
- d(a, b) >= 0, for all a and b
- d(a, b) == 0, if and only if a = b, positive definiteness
- d(a, b) == d(b, a), symmetry
- d(a, c) <= d(a, b) + d(b, c), the triangle inequality
Returns: metric_func – If the metric argument is a function, it is returned. If the metric argument is a string, then the corresponding distance metric function from sklearn.metrics.pairwise is returned. Return type: function
-
region.util.
get_solver_instance
(solver_string)¶
-
region.util.
make_move
(moving_area, new_label, labels)¶ Modify the labels argument in place (no return value!) such that the area moving_area has the new region label new_label.
Parameters: - moving_area – The area to be moved (assigned to a new region).
- new_label (int) – The new region label of area moving_area.
- labels (
numpy.ndarray
) – Each element is a region label of the area corresponding array index.
Examples
>>> import numpy as np >>> labels = np.array([0, 0, 0, 0, 1, 1]) >>> make_move(3, 1, labels) >>> (labels == np.array([0, 0, 0, 1, 1, 1])).all() True
-
region.util.
pop_randomly_from
(lst)¶
-
region.util.
raise_distance_metric_not_set
(x, y)¶
-
region.util.
random_element_from
(lst)¶
-
region.util.
scipy_sparse_matrix_from_dict
(neighbors)¶ Parameters: neighbors (dict) – Each key represents an area. The corresponding value contains the area’s neighbors. Returns: adj – Adjacency matrix representing the areas’ contiguity relation. Return type: scipy.sparse.csr_matrix
Examples
>>> neighbors = {0: {1, 3}, 1: {0, 2, 4}, 2: {1, 5}, ... 3: {0, 4}, 4: {1, 3, 5}, 5: {2, 4}} >>> obtained = scipy_sparse_matrix_from_dict(neighbors) >>> desired = np.array([[0, 1, 0, 1, 0, 0], ... [1, 0, 1, 0, 1, 0], ... [0, 1, 0, 0, 0, 1], ... [1, 0, 0, 0, 1, 0], ... [0, 1, 0, 1, 0, 1], ... [0, 0, 1, 0, 1, 0]]) >>> (obtained.todense() == desired).all() True >>> neighbors = {"left": {"middle"}, ... "middle": {"left", "right"}, ... "right": {"middle"}} >>> obtained = scipy_sparse_matrix_from_dict(neighbors) >>> desired = np.array([[0, 1, 0], ... [1, 0, 1], ... [0, 1, 0]]) >>> (obtained.todense() == desired).all() True
-
region.util.
scipy_sparse_matrix_from_w
(w)¶ Parameters: w ( libpysal.weights.weights.W
) – A W object representing the areas’ contiguity relation.Returns: adj – Adjacency matrix representing the areas’ contiguity relation. Return type: scipy.sparse.csr_matrix
Examples
>>> import libpysal as ps >>> neighbor_dict = {0: {1}, 1: {0, 2}, 2: {1}} >>> w = ps.weights.W(neighbor_dict) >>> obtained = scipy_sparse_matrix_from_w(w) >>> desired = np.array([[0, 1, 0], ... [1, 0, 1], ... [0, 1, 0]]) >>> (obtained.todense() == desired).all() True
-
region.util.
separate_components
(adj, labels)¶ Take a labels array and yield modifications of it (one modified array per connected component). The modified array will be unchanged at those indices belonging to the current connected component. Thus it will have integers >= 0 there. At all other indices the Yielded array will be -1.
Parameters: - adj (
scipy.sparse.csr_matrix
) – Adjacency matrix representing the contiguity relation. - labels (
numpy.ndarray
) –
Yields: comp_dict (
numpy.ndarray
) – Each yielded dict represents one connected component of the graph specified by the adj argument. In a yielded dict, each key is an area and each value is the corresponding region-ID.Examples
>>> edges_island1 = [(0, 1), (1, 2), # 0 | 1 | 2 ... (0, 3), (1, 4), (2, 5), # --------- ... (3, 4), (4,5)] # 3 | 4 | 5 >>> >>> edges_island2 = [(6, 7), # 6 | 7 ... (6, 8), (7, 9), # ----- ... (8, 9)] # 8 | 9 >>> >>> graph = nx.Graph(edges_island1 + edges_island2) >>> adj = nx.to_scipy_sparse_matrix(graph) >>> >>> # island 1: island divided into regions 0, 1, and 2 >>> sol_island1 = [area%3 for area in range(6)] >>> # island 2: all areas are in region 3 >>> sol_island2 = [3 for area in range(6, 10)] >>> labels = np.array(sol_island1 + sol_island2) >>> >>> yielded = list(separate_components(adj, labels)) >>> yielded.sort(key=lambda arr: arr[0], reverse=True) >>> (yielded[0] == np.array([0, 1, 2, 0, 1, 2, -1, -1, -1, -1])).all() True >>> (yielded[1] == np.array([-1, -1, -1, -1, -1, -1, 3, 3, 3, 3])).all() True
- adj (
-
region.util.
w_from_gdf
(gdf, contiguity)¶ Get a W object from a GeoDataFrame.
Parameters: - gdf (GeoDataFrame) –
- contiguity ({"rook", "queen"}) –
Returns: weights – The contiguity information contained in the gdf argument in the form of a W object.
Return type: W