granicus_archiver.googledrive.pathtree

class granicus_archiver.googledrive.pathtree.PathNode(part: PathPart, parent: Self | None = None, folder_cache: dict[Path, FileId] | None = None)[source]

Bases: object

Node representing one segment of a collection of filesystem paths

This is primarily used to aid in creating multiple folders within Drive while avoiding costly lookups for each parent folder’s FileId.

Instead, the folder_id for each known node in the tree is read from the folder_cache. Nodes without a folder_id will increase the cost of the node itself and its ancestors.

Nodes with higher cost can then be prioritized during folder creation, setting their folder_id once it is known.

Standard container methods are supported (aside from __setitem__ and __delitem__)

Example

>>> a = PathNode('a', folder_cache={})
>>> len(a)
0
>>> 'b' in a
False
>>> b = a.add('b')
>>> len(a)
1
>>> 'b' in a
True
>>> a['b'] is b
True
>>> foo = a.add('foo')
>>> len(a)
2
>>> [item for item in a]
[<PathNode: "a/b">, <PathNode: "a/foo">]
Parameters:
  • part (PathPart) – The path part for the node

  • parent (Self | None) – The node’s parent (or None for the root node)

  • folder_cache (FolderCache | None) – The FolderCache for the entire tree. This is only required/valid for the root node.

part: PathPart

The path part

parent: Self | None

The parent node (or None if this is the root node)

full_path: Path

The full path as a combination of all parent parts

nest_level: int

The depth of this node from its root (with the root node beginning at 0)

children: dict[PathPart, PathNode]

Direct children of this node

classmethod create_from_paths(*paths: PathLike, folder_cache: dict[Path, FileId]) PathNode[source]

Create a root node from the given path(s)

If multiple paths are given, they must all have a common root directory.

Parameters:
Return type:

PathNode

Example

>>> root = PathNode.create_from_paths('a/a', 'a/b', 'a/c', folder_cache={})
>>> root
<PathNode: "a">
>>> aa = root['a']
>>> aa
<PathNode: "a/a">
>>> [node for node in root.walk()]
[<PathNode: "a">, <PathNode: "a/a">, <PathNode: "a/b">, <PathNode: "a/c">]
property id: NodeId

Unique id for the node

property node_pos: NodePos

A namedtuple of id and nest_level

property root: Self

The root of the tree

property folder_id: FileId | None

The FileId of the folder matching full_path (if it exists)

property cost: int

The total number of the node’s descendants with no folder_id (including the node itself)

Example

>>> a = PathNode.create_from_paths('a/b/c', folder_cache={})
>>> b = a['b']
>>> c = b['c']
>>> nodes = [a, b, c]

Each node at this point has a local “cost” of 1 which is added to its parent’s overall cost.

>>> for node in nodes:
...     print(node.cost, node.folder_id, node)
3 None a
2 None a/b
1 None a/b/c
>>> c.folder_id = 'c_id'

The last descendant now has a folder_id which reduces the cost up the tree

>>> for node in nodes:
...     print(node.cost, node.folder_id, node)
2 None a
1 None a/b
0 c_id a/b/c
>>> a.folder_id = 'a_id'

The root node now has a folder_id, but still has the cost of its children added.

>>> for node in nodes:
...     print(node.cost, node.folder_id, node)
1 a_id a
1 None a/b
0 c_id a/b/c
>>> b.folder_id = 'b_id'

Now all nodes have folder id’s with a cost of zero.

>>> for node in nodes:
...     print(node.cost, node.folder_id, node)
0 a_id a
0 b_id a/b
0 c_id a/b/c
property folder_cache: dict[Path, FileId]

FolderCache used to populate the folder_id for all nodes in the tree

This is only stored on the root node and will persist across the tree. This allows cache updates to be re-read by all nodes.

add(item: PathPart | Path) PathNode[source]

Add a node (or nodes) to self from the given path

If the given item is a Path instance, it must contain this node’s part at its root. Any sub-directories will be created in the tree and the node representing the final element of the path will be returned.

If item is a PathPart, a child of self will be added and returned with its part set to that value.

Example

>>> from pathlib import Path
>>> a = PathNode('a', folder_cache={})
>>> b = a.add('b')
>>> c = b.add('c')
>>> a, b, c
(<PathNode: "a">, <PathNode: "a/b">, <PathNode: "a/b/c">)

This fails since the root is not included

>>> foobar = a.add(Path('foo') / 'bar')
Traceback (most recent call last):
    ...
ValueError: path "foo/bar" is not relative to "a"
>>> foobar = a.add(Path('a') / 'foo' / 'bar')
>>> foobar
<PathNode: "a/foo/bar">
Parameters:

item (PathPart | Path)

Return type:

PathNode

find(path: PathLike | NodeId) PathNode | None[source]

Find a node with its full_path matching the given path

The search will include self and all of its descendants. If no match is found, None is returned.

Example

>>> a = PathNode('a', folder_cache={})
>>> b = a.add('b')
>>> c = b.add('c')
>>> a.find('a/b') is b
True
>>> a.find('a/b/c') is c
True
Parameters:

path (PathLike | NodeId)

Return type:

PathNode | None

update_from_cache() None[source]

Update from the folder_cache for this node and all of its descendants

This method should be called after any updates to the cache are known to have occurred.

The path for each node is searched for on the cache and if found, sets the folder_id. This in turn will update the cost attribute for each affected node in the tree.

Example

>>> folder_cache = {}
>>> a = PathNode.create_from_paths('a/b/c', folder_cache=folder_cache)
>>> b = a['b']
>>> c = b['c']
>>> nodes = [a, b, c]
>>> for node in nodes:
...     print(node.cost, node.folder_id, node)
3 None a
2 None a/b
1 None a/b/c

Set the cached id for Path(‘a/b/c’) then update the tree

>>> folder_cache[Path('a/b/c')] = 'c_id'
>>> a.update_from_cache()
>>> for node in nodes:
...     print(node.cost, node.folder_id, node)
2 None a
1 None a/b
0 c_id a/b/c
Return type:

None

count() int[source]

Get the total number of descendants at this point in the tree (including self)

Example

>>> a = PathNode.create_from_paths('a/b/c', folder_cache={})
>>> a.count()
3
>>> foo = a.add('foo')
>>> a.count()
4
>>> bar = foo.add('bar')
>>> foo.count()
2
>>> a.count()
5
Return type:

int

get(key: PathPart) PathNode | None[source]

Get the child of self matching part

If not found, None is returned.

Parameters:

key (PathPart)

Return type:

PathNode | None

walk(breadth_first: bool = False) Iterator[PathNode][source]

Recursively iterate over all descendants from this point in the tree

If breadth_first is True, the child nodes for each nest_level will be iterated before advancing deeper in the tree.

If breadth_first is False (default), all descendants will be iterated to their deepest nest_level before advancing to the next branch (depth-first).

Example

>>> paths = ['a/a/a', 'a/a/b', 'a/b/a', 'a/b/b', 'a/c']
>>> root = PathNode.create_from_paths(*paths, folder_cache={})
>>> for node in root.walk():
...     print(node)
a
a/a
a/a/a
a/a/b
a/b
a/b/a
a/b/b
a/c
>>> for node in root.walk(breadth_first=True):
...     print(node)
a
a/a
a/b
a/c
a/a/a
a/a/b
a/b/a
a/b/b
Parameters:

breadth_first (bool)

Return type:

Iterator[PathNode]

class granicus_archiver.googledrive.pathtree.PathPart

Type alias used for PathNode.part

alias of str

class granicus_archiver.googledrive.pathtree.NodeId

Type alias used for PathNode.id

alias of str

class granicus_archiver.googledrive.pathtree.NodePos(id: NodeId, level: int)[source]

Bases: NamedTuple

Position of a PathNode within its tree

Parameters:
id: NodeId

The node’s id

level: int

The node’s nest_level