API

Some of these methods can be used to interact with a SQLite3 database. Others are useful if you’d like to interact with the article data.

gRSS

class gathernews.gRSS.CaptureFeeds(path)

Commits RSS news feeds to a SQLite3 database

create_tables()

Creates tables for RSS news feeds

Creates db tables where each RSS link feeds into a separate table because it’s easier to aggregate then deaggregate.

fix_create_table_bug()

Fix the create table bug

In GatherNews 0.1.0, a bug was introduced that does not allow you to add new RSS feeds to the ‘feeds_list.txt’ after your initial call of the create_tables() method.

This method was created because we have no way of knowing which RSS feed links match which RSS table names without making a call to each RSS feed and recreating each table name.

If you have previously used GatherNews 0.1.0 you should call this method once before calling any other methods on your previously created ‘FeedMe.db’. Once this method is called then the issue should be resolved.

Returns:
Writes a JSON object to your disk called ‘previous_feeds_list’ that will fix the create_table() bug.
Raises:
UserWarning: This bug fix is not needed

RSS links used to pull feeds

get_tablenames()

All table names are extracted for use in SQL queries.

Returns:
revised_list: A list of all table names in the database is here.
load_db()

Loads the sqlite3 database

make_table_names(RSS_link, create_these_tables)

Make the table names for the sqlite3 database.

Args:
RSS_link: RSS_link from ‘feeds_list.txt’ create_these_tables: A list of table names to be created.
Returns:
The ‘create_these_tables’ list is returned with a table name appended to it.
populate_db()

Queries are matched with dict keys which then provides the values associated with each query by sharing the table names as a reference point. This allows rows to be populated for each table leading to the population of the db.

read_file(path, your_file_name)

Reads in file so that only rss links are included

Unfortunately, .readlines() or .read() alone was sucking in extra ‘
‘ symbols not related to the RSS links. This approach uses regular

expressions to only list items that are consistent with an RSS feed link.

Args:
path: the file path. Ex. “home ylerGathernewsgathernews” your_file_name: name of file you want to read
Returns:
List of strings where each string is a link to an RSS feed
Raises:
UserWarning: “Could not recognize the file”
rm_duplicates()

Limitation of this duplicate removal approach is that only one duplicate entry will be removed (containing the lowest valued primary_key). If the number of duplicate entries per item > 2 then that will introduce a bug.

rss_feeds_data()

Dictionary of table names mapped to a list of tuples for articles

This data structure is the ‘last stop’ before the article information is loaded into the SQL database. As such, it can also be called by itself; if loading a database is not what we want to do.

Returns:
Each key is a table name in the SQL database. The table name is mapped to a list of tuples. Each tuple in the list contains one string for every field in the database schema. The tuple is: (article title, article description, article link, date/time article published)

Table Of Contents

Related Topics

This Page

Fork me on GitHub