granicus_archiver.legistar.model

granicus_archiver.legistar.model.is_attachment_uid(uid: LegistarFileUID) bool[source]

Returns True if the given uid is an attachment reference

Parameters:

uid (LegistarFileUID)

Return type:

bool

granicus_archiver.legistar.model.uid_to_attachment_name(uid: LegistarFileUID) AttachmentName[source]

Convert the given LegistarFileUID to an AttachmentName

Raises:

TypeError – If the uid is not an attachment reference

Parameters:

uid (LegistarFileUID)

Return type:

AttachmentName

granicus_archiver.legistar.model.attachment_name_to_uid(name: AttachmentName) LegistarFileUID[source]

Convert the given AttachmentName to a LegistarFileUID

Parameters:

name (AttachmentName)

Return type:

LegistarFileUID

granicus_archiver.legistar.model.uid_to_file_key(uid: LegistarFileUID) Literal['agenda', 'minutes', 'agenda_packet', 'video'][source]

Convert the given LegistarFileUID to a LegistarFileKey

Raises:

TypeError – If the uid is not a valid key

Parameters:

uid (LegistarFileUID)

Return type:

Literal[‘agenda’, ‘minutes’, ‘agenda_packet’, ‘video’]

granicus_archiver.legistar.model.file_key_to_uid(key: Literal['agenda', 'minutes', 'agenda_packet', 'video']) LegistarFileUID[source]

Convert the given LegistarFileKey to a LegistarFileUID

Parameters:

key (Literal['agenda', 'minutes', 'agenda_packet', 'video'])

Return type:

LegistarFileUID

class granicus_archiver.legistar.model.FilePathURL(key: KT, filename: Path, url: URL)[source]

Bases: NamedTuple, Generic[KT]

Parameters:
  • key (KT)

  • filename (Path)

  • url (URL)

key: KT

Alias for field number 0

filename: Path

Alias for field number 1

url: URL

Alias for field number 2

class granicus_archiver.legistar.model.FilePathURLComplete(key: KT, filename: Path, url: URL, complete: bool)[source]

Bases: NamedTuple, Generic[KT]

Parameters:
  • key (KT)

  • filename (Path)

  • url (URL)

  • complete (bool)

key: KT

Alias for field number 0

filename: Path

Alias for field number 1

url: URL

Alias for field number 2

complete: bool

Alias for field number 3

class granicus_archiver.legistar.model.UpdateResult(changed: bool, link_keys: list[LegistarFileKey], attachment_keys: list[AttachmentName], attributes: dict[str, Any] | None = None)[source]

Bases: NamedTuple

Parameters:
changed: bool

Whether any changes were made

Any URL attributes from DetailPageLinks that changed

attachment_keys: list[AttachmentName]

Any keys in DetailPageLinks.attachments that changed

attributes: dict[str, Any] | None

Attributes of DetailPageResult that changed

class granicus_archiver.legistar.model.AbstractFile(name: KT, filename: Path, metadata: FileMeta, pdf_links_removed: bool = False)[source]

Bases: Serializable, ABC, Generic[KT]

Abstract base class for file information

Parameters:
  • name (KT)

  • filename (Path)

  • metadata (FileMeta)

  • pdf_links_removed (bool)

name: KT

File key

filename: Path

Local file path

metadata: FileMeta

The file’s metadata

Whether the embedded pdf links of the file have been removed

property is_pdf: bool

True if this is a pdf file

Strip embedded links from the pdf file

If the file is not a pdf or if pdf_links_removed is already set to True, no alteration will be performed.

This removes URL from the hardcoded links only and does not reformat the text. It will still appear as blue with an underline, but will no longer be clickable or have a URL action.

The resulting file will have the same path and the metadata will be updated with the new file size (types.FileMeta.content_length).

The pdf_links_removed flag will then be set to True

Return type:

bool

class granicus_archiver.legistar.model.LegistarFile(name: KT, filename: Path, metadata: FileMeta, pdf_links_removed: bool = False)[source]

Bases: AbstractFile[Literal[‘agenda’, ‘minutes’, ‘agenda_packet’, ‘video’]]

Information for a downloaded file within LegistarFiles.files using LegistarFileKey for the name attribute

Parameters:
  • name (KT)

  • filename (Path)

  • metadata (FileMeta)

  • pdf_links_removed (bool)

class granicus_archiver.legistar.model.AttachmentFile(name: KT, filename: Path, metadata: FileMeta, pdf_links_removed: bool = False)[source]

Bases: AbstractFile[AttachmentName]

Information for a downloaded attachment within LegistarFiles.attachments using AttachmentName for the name attribute

Parameters:
  • name (KT)

  • filename (Path)

  • metadata (FileMeta)

  • pdf_links_removed (bool)

class granicus_archiver.legistar.model.LegistarFiles(guid: ~granicus_archiver.legistar.types.GUID, files: dict[~typing.Literal['agenda', 'minutes', 'agenda_packet', 'video'], ~granicus_archiver.legistar.model.LegistarFile] = <factory>, attachments: dict[~granicus_archiver.legistar.types.AttachmentName, ~granicus_archiver.legistar.model.AttachmentFile | None] = <factory>)[source]

Bases: Serializable

Collection of files for a DetailPageResult

Parameters:
guid: GUID

The guid of the DetailPageResult

files: dict[Literal['agenda', 'minutes', 'agenda_packet', 'video'], LegistarFile]

Downloaded LegistarFile information

attachments: dict[AttachmentName, AttachmentFile | None]

Additional file attachments as AttachmentFile objects

Call remove_pdf_links() on all files and attachments

Return type:

bool

get_file_uid(key: Literal['agenda', 'minutes', 'agenda_packet', 'video']) LegistarFileUID[source]

Get a unique key for the given LegistarFileKey

Parameters:

key (Literal['agenda', 'minutes', 'agenda_packet', 'video'])

Return type:

LegistarFileUID

get_attachment_uid(name: AttachmentName) LegistarFileUID[source]

Get a unique key for the given AttachmentName

Parameters:

name (AttachmentName)

Return type:

LegistarFileUID

resolve_file_uid(uid: LegistarFileUID) tuple[Literal['agenda', 'minutes', 'agenda_packet', 'video'] | AttachmentName, bool][source]

Resolve the LegistarFileUID to its original form

Returns:

Return type:

(tuple)

Parameters:

uid (LegistarFileUID)

resolve_uid(uid: LegistarFileUID) LegistarFile | AttachmentFile | None[source]

Get the LegistarFile or AttachmentFile referenced by the given uid (or None if it does not exist)

Parameters:

uid (LegistarFileUID)

Return type:

LegistarFile | AttachmentFile | None

ensure_local_hashes(legistar_data: LegistarData, check_existing: bool = False) bool[source]

Ensure that all local files have an sha1 hash stored in their metadata

Parameters:
  • check_existing (bool) – If True, the hash of the local file will be checked against the stored hash

  • legistar_data (LegistarData)

Returns:

True if any hashes were generated or updated

Return type:

bool

Bases: Serializable

Links gathered from a meeting detail page

Parameters:
agenda: URL | None

Agenda URL

minutes: URL | None

Minutes URL

agenda_packet: URL | None

Agenda Packet URL

video: URL | None

Video player URL

attachments: dict[AttachmentName, URL]

Attachment URLs

get_clip_id_from_video() CLIP_ID | None[source]

Parse the clip_id from the video url (if it exists)

Return type:

CLIP_ID | None

iter_uids() Iterator[tuple[LegistarFileUID, URL | None]][source]

Iterate over all files and attachments as LegistarFileUID, URL tuples

Return type:

Iterator[tuple[LegistarFileUID, URL | None]]

update(other: Self) UpdateResult[source]

Update self with any changes from other

Parameters:

other (Self)

Return type:

UpdateResult

class granicus_archiver.legistar.model.DetailPageResult(page_url: URL, feed_guid: GUID, location: str, links: DetailPageLinks, agenda_status: Literal['Final', 'Final-Addendum', 'Draft', 'Not Viewable by the Public'], minutes_status: Literal['Final', 'Final-Addendum', 'Draft', 'Not Viewable by the Public'], feed_item: FeedItem, last_fake_stupid_guid: GUID | None = None)[source]

Bases: Serializable

Data gathered from a meeting detail (/MeetingDetail.aspx) page

Parameters:
  • page_url (URL)

  • feed_guid (GUID)

  • location (str)

  • links (DetailPageLinks)

  • agenda_status (Literal['Final', 'Final-Addendum', 'Draft', 'Not Viewable by the Public'])

  • minutes_status (Literal['Final', 'Final-Addendum', 'Draft', 'Not Viewable by the Public'])

  • feed_item (FeedItem)

  • last_fake_stupid_guid (GUID | None)

page_url: URL

The detail page url (from rss_parser.FeedItem.link)

feed_guid: GUID

The rss_parser.FeedItem.guid

location: str

The meeting’s location (where it takes place)

URL data

agenda_status: Literal['Final', 'Final-Addendum', 'Draft', 'Not Viewable by the Public']

Agenda status

minutes_status: Literal['Final', 'Final-Addendum', 'Draft', 'Not Viewable by the Public']

Minutes status

feed_item: FeedItem

The FeedItem associated with this instance

last_fake_stupid_guid: GUID | None = None

The last known guid with the stupid timestamp part

This is to avoid having to reparse the detail pages since Legistar apparently likes to change ALL the guids at the beginning of every year. SMH this is ridiculous!

property clip_id: CLIP_ID | None

The clip_id parsed from DetailPageLinks.get_clip_id_from_video()

property agenda_final: bool

True if agenda_status is final

property minutes_final: bool

True if minutes_status is final

property is_addendum: bool

True if agenda_status or minutes_status is "Final-Addendum"

property item_status: Literal['final', 'addendum', 'draft', 'hidden']

Overall item status

One of:

  • "final"

  • "addendum"

  • "draft"

  • "hidden"

property can_download: bool

Whether this item may be safely downloaded

property is_draft: bool

True if agenda_status or minutes_status are set to “Draft”

property is_hidden: bool

Whether the item is hidden (if agenda_status is “Not Viewable by the Public”)

property is_future: bool

Alias for rss_parser.FeedItem.is_future

property is_in_past: bool

Alias for rss_parser.FeedItem.is_in_past

property real_guid: REAL_GUID

Alias for rss_parser.FeedItem.real_guid

get_unique_folder() Path[source]

Get a local path to store files for this item

The structure will be:

<category>/<year>/<datetime>_<title>_<status>

Where

<category>

Is the category of the feed_item

<year>

Is the 4-digit year of the meeting_date

<datetime>

Is the meeting_date (formatted as "%Y%m%d-%H%M")

<title>

Is the title

<status>

Is the item_status

This combination was chosen to ensure uniqueness.

Return type:

Path

classmethod from_html(html_str: str | bytes, feed_item: FeedItem) Self[source]

Create an instance from the raw html from page_url

Parameters:
Return type:

Self

copy() Self[source]

Return a deep copy of this item and its children

Return type:

Self

update(other: Self) UpdateResult[source]

Update self with changed attributes in other

Parameters:

other (Self)

Return type:

UpdateResult

class granicus_archiver.legistar.model.AbstractLegistarModel(root_dir: 'Path')[source]

Bases: Serializable, Generic[_GuidT, _ItemT]

Parameters:

root_dir (Path)

filter_by_category(*categories: Category, items: dict[_GuidT, _ItemT] | None = None) dict[_GuidT, _ItemT][source]

Filter items by category

Parameters:
  • *categories (Category) – One or many categories to filter by

  • items (dict[_GuidT, _ItemT] | None) – Items to filter. If not given, all existing items in detail_results will be used.

Return type:

dict[_GuidT, _ItemT]

filter_by_dt_range(start_dt: datetime | None, end_dt: datetime | None, items: dict[_GuidT, _ItemT] | None = None) dict[_GuidT, _ItemT][source]

Filter items by their meeting_date

Parameters:
  • start_dt (datetime | None) – If given, items before this datetime will be filtered out

  • end_dt (datetime | None) – If given, items after this datetime will be filtered out

  • items (dict[_GuidT, _ItemT] | None) – Items to filter. If not given, all existing items in detail_results will be used.

Return type:

dict[_GuidT, _ItemT]

Note

If start_dt or end_dt are not timezone-aware the (no tzinfo), the configured local timezone is assumed.

class granicus_archiver.legistar.model.LegistarData(root_dir: ~pathlib._local.Path, matched_guids: dict[~granicus_archiver.clips.model.CLIP_ID, ~granicus_archiver.legistar.types.GUID] = <factory>, matched_real_guids: dict[~granicus_archiver.clips.model.CLIP_ID, ~granicus_archiver.legistar.types.REAL_GUID] = <factory>, detail_results: dict[~granicus_archiver.legistar.types.GUID, ~granicus_archiver.legistar.model.DetailPageResult] = <factory>, items_by_clip_id: dict[~granicus_archiver.clips.model.CLIP_ID, ~granicus_archiver.legistar.model.DetailPageResult] = <factory>, files: dict[~granicus_archiver.legistar.types.GUID, ~granicus_archiver.legistar.model.LegistarFiles] = <factory>, clip_id_overrides: dict[~granicus_archiver.legistar.types.REAL_GUID, ~granicus_archiver.clips.model.CLIP_ID | ~typing.Literal[_DoesNotExistEnum.DoesNotExist]] = <factory>)[source]

Bases: AbstractLegistarModel[GUID, DetailPageResult]

Container for data gathered from Legistar

Parameters:
root_dir: Path

Root filesystem path for downloading assets

matched_guids: dict[CLIP_ID, GUID]

Clips that have been matched to FeedItems

matched_real_guids: dict[CLIP_ID, REAL_GUID]

Similar to matched_guids, but uses REAL_GUID

detail_results: dict[GUID, DetailPageResult]

Mapping of parsed DetailPageResult items with their feed_guid as keys

items_by_clip_id: dict[CLIP_ID, DetailPageResult]

Mapping of items in detail_results with a valid clip_id

files: dict[GUID, LegistarFiles]

Mapping of downloaded LegistarFiles with their guid as keys

clip_id_overrides: dict[REAL_GUID, CLIP_ID | Literal[_DoesNotExistEnum.DoesNotExist]]

Mapping of items manually-linked to Clips

get_clip_id_for_guid(guid: GUID, use_overrides: bool = True) CLIP_ID | None | Literal[_DoesNotExistEnum.DoesNotExist][source]

Get the clip id linked to the given guid

Parameters:
Return type:

CLIP_ID | None | Literal[_DoesNotExistEnum.DoesNotExist]

Returns one of:

  • clip_id (CLIP_ID)

    The matched Clip.id (if one was found)

  • NoClip

    If the item has been explicitly set to have no Clip associated with it

  • None

    If no match was found

get_future_items() Iterator[DetailPageResult][source]

Iterate over any items in detail_results that are in the future

Return type:

Iterator[DetailPageResult]

ensure_no_future_items() None[source]

Ensure there are no items in detail_results that are in the future

Return type:

None

ensure_unique_item_folders() None[source]

Unsure paths generated by DetailPageResult.get_folder_for_item() are unique among all items in detail_results

Return type:

None

is_clip_id_available(clip_id: CLIP_ID) bool[source]

Check whether the given clip id is linked to an item (returns True if there is no link)

Parameters:

clip_id (CLIP_ID)

Return type:

bool

is_guid_matched(guid: GUID) bool[source]

Check whether the item matching guid has a Clip associated with it

Parameters:

guid (GUID)

Return type:

bool

get_by_real_guid(real_guid: REAL_GUID) DetailPageResult | None[source]

Get the DetailPageResult matching the given real_guid

If no match is found, None is returned.

Parameters:

real_guid (REAL_GUID)

Return type:

DetailPageResult | None

find_match_for_clip_id(clip_id: CLIP_ID) DetailPageResult | None | Literal[_DoesNotExistEnum.DoesNotExist][source]

Find a DetailPageResult match for the given clip_id

Parameters:

clip_id (CLIP_ID)

Return type:

DetailPageResult | None | Literal[_DoesNotExistEnum.DoesNotExist]

add_guid_match(clip_id: CLIP_ID, guid: GUID) None[source]

Add a Clip.id -> FeedItem match to matched_guids

This may seem redunant considering the find_match_for_clip_id() method, but is intended for adding matches for items without a video url to parse.

Parameters:
Return type:

None

add_clip_match_override(real_guid: REAL_GUID, clip_id: CLIP_ID | None | Literal[_DoesNotExistEnum.DoesNotExist]) None[source]

Add a manual override for the given real_guid

Parameters:
  • real_guid (REAL_GUID) – The real_guid of the legistar item

  • clip_id (CLIP_ID | None | Literal[_DoesNotExistEnum.DoesNotExist]) – The clip model.Clip.id matching the item. If NoClip is given, this signifies that the item should not have a Clip associated with it. If None is given, any previously added overrides for real_guid will be removed.

Return type:

None

add_detail_result(item: DetailPageResult) None[source]

Add a parsed DetailPageResult to detail_results

Parameters:

item (DetailPageResult)

Return type:

None

iter_guid_matches() Iterator[tuple[CLIP_ID, DetailPageResult]][source]

Iterate over items added by the add_guid_match(), add_guid_match() and add_clip_match_override() methods

Results are tuples of CLIP_ID and DetailPageResult

Return type:

Iterator[tuple[CLIP_ID, DetailPageResult]]

get_folder_for_item(item: GUID | DetailPageResult) Path[source]

Get a local path to store files for a DetailPageResult

See DetailPageResult.get_folder_for_item() for more details.

Parameters:

item (GUID | DetailPageResult)

Return type:

Path

get_or_create_files(guid: GUID) LegistarFiles[source]

Get a LegistarFiles instance for guid, creating one if it does not exist

Parameters:

guid (GUID)

Return type:

LegistarFiles

get_file_uid(guid: GUID, key: Literal['agenda', 'minutes', 'agenda_packet', 'video']) LegistarFileUID[source]

Get a unique key for the given GUID and LegistarFileKey

Parameters:
  • guid (GUID)

  • key (Literal['agenda', 'minutes', 'agenda_packet', 'video'])

Return type:

LegistarFileUID

get_attachment_uid(guid: GUID, name: AttachmentName) LegistarFileUID[source]

Get a unique key for the given GUID and AttachmentName

Parameters:
Return type:

LegistarFileUID

get_path_for_uid(guid: GUID, uid: LegistarFileUID) tuple[Path, FileMeta | None][source]

Get filesystem path for the GUID and LegistarFileUID

Returns:

  • filename (Path): The local filename

  • meta (FileMeta, optional): The file’s metadata (if it exists)

Return type:

(tuple)

Parameters:
iter_files_for_upload(guid: GUID) Iterator[tuple[LegistarFileUID, Path, FileMeta, bool]][source]

Iterate over files present locally for the given GUID

Yields:
  • uid (LegistarFileUID): The uid for the file type

  • filename (Path): The local file path

  • meta (FileMeta): Local meta data for the file

  • is_attachment (bool): True if the uid refers to an attachment, False otherwise

Parameters:

guid (GUID)

Return type:

Iterator[tuple[LegistarFileUID, Path, FileMeta, bool]]

get_file_path(guid: GUID, key: Literal['agenda', 'minutes', 'agenda_packet', 'video']) Path[source]

Get the local path for the LegistarFiles object matching the given guid and file key

Parameters:
  • guid (GUID)

  • key (Literal['agenda', 'minutes', 'agenda_packet', 'video'])

Return type:

Path

set_uid_complete(guid: GUID, uid: LegistarFileUID, meta: FileMeta, pdf_links_removed: bool = False) LegistarFile | AttachmentFile[source]

Set the file or attachment for the given parameters as “complete” (after successful download)

This calls either set_file_complete() or set_attachment_complete() depending on the uid.

Parameters:
Return type:

LegistarFile | AttachmentFile

set_file_complete(guid: GUID, key: Literal['agenda', 'minutes', 'agenda_packet', 'video'], meta: FileMeta, pdf_links_removed: bool = False) LegistarFile[source]

Set the file for the given parameters as “complete” (after successful download)

Parameters:
Return type:

LegistarFile

get_attachment_path(guid: GUID, name: AttachmentName) Path[source]

Get the local path for an item in LegistarFiles.attachments

Parameters:
Return type:

Path

set_attachment_complete(guid: GUID, name: AttachmentName, meta: FileMeta, pdf_links_removed: bool = False) AttachmentFile[source]

Set an item in LegistarFiles.attachments as “complete” (after successful download)

Parameters:
Return type:

AttachmentFile

iter_url_paths_uid(guid: GUID) Iterator[FilePathURLComplete[LegistarFileUID]][source]

Iterate over all files and attachments for the given guid as FilePathURLComplete tuples using the uid as the key parameter

Parameters:

guid (GUID)

Return type:

Iterator[FilePathURLComplete[LegistarFileUID]]

iter_attachments(guid: GUID) Iterator[FilePathURLComplete[AttachmentName]][source]

Iterate over any LegistarFiles.attachments for the given guid (as FilePathURLComplete tuples)

Parameters:

guid (GUID)

Return type:

Iterator[FilePathURLComplete[AttachmentName]]

iter_incomplete_attachments(guid: GUID) Iterator[FilePathURL[AttachmentName]][source]

Iterate over LegistarFiles.attachments which have not been downloaded (as FilePathURL tuples)

Parameters:

guid (GUID)

Return type:

Iterator[FilePathURL[AttachmentName]]

iter_url_paths(guid: GUID) Iterator[FilePathURLComplete[Literal['agenda', 'minutes', 'agenda_packet', 'video']]][source]

Iterate over items in a LegistarFiles instance (as FilePathURLComplete tuples)

Parameters:

guid (GUID)

Return type:

Iterator[FilePathURLComplete[Literal[‘agenda’, ‘minutes’, ‘agenda_packet’, ‘video’]]]

iter_incomplete_url_paths(guid: GUID) Iterator[FilePathURL[Literal['agenda', 'minutes', 'agenda_packet', 'video']]][source]

Iterate over items in a LegistarFiles instance which have not been downloaded (as FilePathURL tuples)

Parameters:

guid (GUID)

Return type:

Iterator[FilePathURL[Literal[‘agenda’, ‘minutes’, ‘agenda_packet’, ‘video’]]]

iter_existing_url_paths(guid: GUID) Iterator[FilePathURL[Literal['agenda', 'minutes', 'agenda_packet', 'video']]][source]

Iterate over items in a LegistarFiles instance which have been successfully downloaded (as FilePathURL tuples)

Parameters:

guid (GUID)

Return type:

Iterator[FilePathURL[Literal[‘agenda’, ‘minutes’, ‘agenda_packet’, ‘video’]]]

classmethod load(filename: PathLike, root_dir: Path | None = None) Self[source]

Loads an instance from previously saved data

Parameters:
  • filename (PathLike)

  • root_dir (Path | None)

Return type:

Self

save(filename: PathLike, indent: int | None = 2) None[source]

Saves all clip data as JSON to the given filename

Parameters:
Return type:

None