`granicus_archiver.legistar.rss_parser`¶

granicus_archiver.legistar.rss_parser.FUTURE_TIMEDELTA = datetime.timedelta(days=1)¶: Amount of time after a meeting_date that must pass before it is no longer considered a future item

granicus_archiver.legistar.rss_parser.is_guid(item: str) → TypeIs[GUID][source]¶

Check whether the given value is a valid GUID

Parameters:: item (str)
Return type:: TypeIs[GUID]

granicus_archiver.legistar.rss_parser.is_real_guid(item: str) → TypeIs[REAL_GUID][source]¶

Check whether the given value is a valid REAL_GUID

Parameters:: item (str)
Return type:: TypeIs[REAL_GUID]

class granicus_archiver.legistar.rss_parser.GuidCompare(guid: GUID)[source]¶

Bases: object

Helper to compare GUID's

Since the “GUID’s” (loose term because they aren’t really GUID’s) contain date/time information, it can actually be useful to determine whether an update is needed from a feed item or not.

Instances can be compared using the ==, !=, >, >=, < and <= operators.

Using the following GUID:

>>> real_guid_a = 'F239FB22-A00A-6FF1-3E97-0F36043B96F6'
>>> a = f'{real_guid_a}-2023-01-01-12-30-00'
>>> b = f'{real_guid_a}-2024-01-01-12-30-00'

Both a and b use the same GUID, but a is one year behind b

>>> GuidCompare(a) == a
True
>>> GuidCompare(a) != b
True
>>> GuidCompare(a) == b
False
>>> GuidCompare(a) > b
False
>>> GuidCompare(a) < b
True
>>> GuidCompare(b) > a
True

If the GUID portion does not match, equality checks will reflect that in equality checks:

>>> real_guid_b = '8F6DD61F-3498-3FF1-12B9-38DBE1CA9B06'
>>> c = f'{real_guid_b}-2024-01-01-12-30-00'
>>> GuidCompare(a) == c
False
>>> GuidCompare(b) == c
False
>>> GuidCompare(c) == a
False
>>> GuidCompare(c) == b
False

<, > checks however are not supported in this case:

>>> GuidCompare(c) > a
Traceback (most recent call last):
    ...
TypeError: '>' not supported ...

Parameters:: guid (GUID)

class granicus_archiver.legistar.rss_parser.FeedItem(title: str, link: URL, guid: GUID, category: Category, meeting_date: datetime, pub_date: datetime)[source]¶

Bases: Serializable

An RSS feed item representing a meeting in the Legistar calendar

A typical item representation would be

<item>
    <title>City Council - 9/9/2024 - 2:00 PM</title>
    <link>https://mansfield.legistar.com/Gateway.aspx?.......</link>
    <guid isPermaLink="false">...</guid>
    <description/>
    <category>City Council</category>
    <pubDate>Tue, 10 Sep 2024 16:52:26 GMT</pubDate>
</item>

Note the value for the <title> element. It contains the title (or what should be the title) followed by a date and a time. Note also that the <pubDate> field would appear as 9/10/2024 - 11:52 AM after time zone conversion (instead of 9/9/2024 - 2:00 PM). This is likely the last time the item was altered in Legistar (explaining the discrepancy).

This makes the pubDate useless for determining the scheduled date/time for the event and we are forced instead to extract it from the title and hope for the best.

Since there is no timezone information available for it however, we’re also forced to assume that the timezone is fixed as the municipality’s local time (and we all know what assuming does).

Parameters:

title (str)
link (URL)
guid (GUID)
category (Category)
meeting_date (datetime)
pub_date (datetime)

title: str¶: The meeting title. This is confusingly a combination of the meeting name, date and time in the RSS feed (see notes above). When parsed, the date and time are stripped, leaving only the title string

link: URL¶: URL for the meeting details page

guid: GUID¶: A globally-unique id for the item

category: Category¶: The meeting “category” (sometimes also referred to as “Department”) Note that this may or may not match the value of Clip.location, but that is the intent.

meeting_date: datetime¶: The scheduled date and time of the meeting, parsed from the original title and converted to the local timezone

pub_date: datetime¶: Date and time the meeting was published (not the meeting date/time)

ITEM_IN_PAST_DELTA: ClassVar[timedelta] = datetime.timedelta(days=365)¶: Amount of time to consider an item as “in the past” (default is one year)

TZ: ClassVar[ZoneInfo | None] = None¶: Local timezone used to parse meeting_date

property is_future: bool¶: Whether the item is in the future

property is_in_past: bool¶: Whether the item is older than ITEM_IN_PAST_DELTA

classmethod from_rss(elem: PyQuery) → Self[source]¶

Parse and create an item from its RSS data

Parameters:: elem (PyQuery)
Return type:: Self

classmethod to_csv(*items: FeedItem) → str[source]¶

Get a comma-separated representation for the given feed items

The result will include a header followed by the results of to_csv_line() for each item given.

Parameters:: items (FeedItem)
Return type:: str

to_csv_line() → str[source]¶

Get the comma-separated values of this item

The attributes returned will be

title
meeting_date (the date() portion only)
link

Return type:: str

property real_guid: REAL_GUID¶: The portion of guid that IS ACTUALLY A GUID (With the ridiculous date-time portion of it removed.. really, I’m not making this up)

class granicus_archiver.legistar.rss_parser.Feed(items: Iterable[FeedItem] | None = None, category_maps: dict[Location, Category] | None = None)[source]¶

Bases: Serializable

An representation of Legistar’s calendar RSS feed

The URL for this should have the options configured to show “All Years” and “All Departments” on the main /Calendar.aspx page. That is, unless there are more than 100 meetings in your agenda history (which is very likely to be the case).

The RSS feed that legistar generates, with all of their years of wisdom, limits the number of results to 100 items making it almost completely useless for archival purposes.

The only known method to get around this is to parse separate feeds by choosing the “Departments” and sometimes each year individually. This seems (and is!) a horribly laborious process, but it’s definitely easier than manually downloading and naming over 4000 files for around 2000 meetings!

Parameters:

items (dict[GUID, FeedItem])
category_maps (dict[Location, Category])

item_list: list[FeedItem]¶: The feed items as FeedItem instances

items: dict[GUID, FeedItem]¶: Mapping of items using their guid as keys

items_by_category: dict[Category, dict[GUID, FeedItem]]¶: Mapping of items by their category

category_maps: dict[Location, Category]¶

A dict of any custom mappings to match the Clip.location fields to their appropriate FeedItem.category

The keys for this should be the location with the values set to the category.

classmethod from_feed(doc_str: str | bytes, category_maps: dict[Location, Category] | None = None, overflow_allowed: bool = False) → Self[source]¶

Create an instance by parsing the supplied RSS data

Parameters:

doc_str (str | bytes) – The raw RSS/XML string
category_maps (dict[Location, Category] | None) – Value for the feed’s category_maps
overflow_allowed (bool) – If True disables raising LegistarThinksRSSCanPaginateError if the feed’s item count is 100. The default (False) allows exception to be raised.

Raises:

LegistarThinksRSSCanPaginateError – If the feed’s item count is 100 and overflow_allowed is False

Return type:

Self

find_clip_match(clip: Clip, search_delta: timedelta = datetime.timedelta(seconds=14400)) → FeedItem[source]¶

Attempt to match the given clip to a FeedItem

The Clip.location is first used to filter items by category (using any custom overrides in category_maps).

A match between the Clip.datetime and FeedItem.meeting_date is then searched and the closest match is returned if within +/- four hours.

Raises:

CategoryError – If no category match was found
DatetimeError – If a match could not be found for clip’s the datetime

Parameters:

clip (Clip)
search_delta (timedelta)

Return type:

FeedItem

granicus_archiver.legistar.rss_parser¶

`granicus_archiver.legistar.rss_parser`¶