Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex filters #1864

Open
willhains opened this issue Mar 9, 2020 · 25 comments
Open

Regex filters #1864

willhains opened this issue Mar 9, 2020 · 25 comments

Comments

@willhains
Copy link

(Feature request.)

It would be great to have a regex filter to filter out unwanted posts from a feed, for example podcast episode announcements, weekly recap posts, things like that.

@exrector
Copy link

In this closed issue below, I deliberately focused on the GLOBAL filter. Because I know several rss readers with filters for each service / resource / site / source. It's not user friendly as hell. Authors write the same articles on different resources. It is necessary to add the "word" to each and every black lists. This is stupid. In an ideal world, it looks like this: I select a word and press "block" in the context menu. The first thing I will block will be "custom icons ios" 😂

Could you add a filter function in the future? (Block words in topic names and content) blacklist of words exclusion is applicable globally to all accounts. issue

@adinklotz
Copy link

adinklotz commented Mar 17, 2021

I would say it's important to have both global and per-feed filters. There are some topics I want to block everywhere, but there are also, for example, specific columns from specific sources I follow that I would like to be able to block by matching the name of the column in the title only on that source

I don't know how complicated they're willing to get with this, but I would love a full filtering system a la Gmail, so I can filter out any combination of:

  • Regex match on title
  • Regex match on contents
  • Feed source
  • Author
  • RSS category tags
  • etc
@LitoFrame
Copy link

A filter function will be amazing. I have some feeds where I would like to see 5-10% of the total articles which I could filter with keywords...

@Joilence
Copy link

I think the filter function would be a killer feature compared to other RSS client apps and wonder why this issue was closed before.

@exrector
Copy link

Feedly

B07EFE8F-8F90-4E05-B7A2-C431D85D0EDB

@Saklad5
Copy link

Saklad5 commented Mar 31, 2022

I'd like to make a further request: the ability to "split" RSS feeds based on regular expression groups.


Many websites, such as YouTube, have a "subscription" method that is used for following certain content. The output of this can often be accessed as a feed, and doing this allows you to have a single source of truth with regards to what you are consuming from that site. For instance, subscribing to a channel on YouTube would add videos from that channel to the feed you already have in NetNewsWire, and unsubscribing would remove videos. While you can add channel feeds to NetNewsWire directly, doing this on a channel-by-channel basis can be an immense headache. This approach avoids that hassle.

Of course, the result is a massive heterogenous "feed of feeds", which doesn't lend itself well to reading. So here's my thinking: what if you could specify a filter that split a feed into sub-feeds based on the output? Each extracted value could be presented as a distinct feed, and you'd end up with the best of both worlds.


Ideally, websites would host a list of feed URLs (like OPML) that an aggregator would check like a conventional feed and use to update its list of feeds, but that doesn't seem to be a thing. Perhaps there isn't much demand for lists of lists of articles.

@djpowers
Copy link

I'd love to see such a feature. In my use case, I have some feeds from publications that allegedly have different feeds for different categories of the site, but (intentionally or not) pollute them with other unrelated articles.

I'd love to be able to set something up to only allow RSS items beginning with the URL https://example.com/desired-section (for example) to filter out extra items.

@Saklad5
Copy link

Saklad5 commented Apr 10, 2022

In my use case, I have some feeds from publications that allegedly have different feeds for different categories of the site, but (intentionally or not) pollute them with other unrelated articles.

The first resort should be complaining to publications to fix their site. You should only implement workarounds if you have to (that is, if you know it is intentional).

I've gotten many feeds fixed simply by tracking down points of contact and explaining what's wrong.

@djpowers
Copy link

In my use case, I have some feeds from publications that allegedly have different feeds for different categories of the site, but (intentionally or not) pollute them with other unrelated articles.

The first resort should be complaining to publications to fix their site. You should only implement workarounds if you have to (that is, if you know it is intentional).

I've gotten many feeds fixed simply by tracking down points of contact and explaining what's wrong.

Appreciate the tip, I've actually reached out to a general "help" email address to ask if this was by design, but this is a bigger publication and I didn't hear back. I'll follow up again though and see if I can find someone more specific to address it.

But for the main feature request, this would definitely be helpful for such sites that only have one all-encompassing feed where you want to break down into only certain sections.

@olofhellman
Copy link
Contributor

My gut feeling is that this is not really a NetNewsWire feature -- for the reasons that #2605 got closed. BUT, it does seem like there's an open space for a companion app which synthesizes RSS feeds given some custom logic. That is, companion app would let you specify some rules and produce a feed (which might only exists locally), and then NNW subscribes to that. Looking around the web, that's a feature that Zapier calls an "superfeed": https://zapier.com/blog/make-your-own-rss-superfeed/

@willhains
Copy link
Author

for the reasons that #2605 got closed.

Mm? #2605 was closed because it’s a duplicate of this issue.

@brentsimmons
Copy link
Collaborator

We would like to add smart feeds, which are like smart playlists or smart mailboxes. We would also like to add script feeds, where NetNewsWire actually runs a script that returns a feed. With one or both of these you could probably get something close to what you’re looking for.

@olofhellman
Copy link
Contributor

Mm? #2605 was closed because it’s a duplicate of this issue.

Sorry, my bad. Was thinking of a different issue.

@justinferrell
Copy link

Black Friday has renewed my want for this because almost every one of my folders is a wall of deals with no real news. I’d love to be able to mute “Black Friday” for a bit. Being able to set an expiring filter, like Mastodon’s expiring mute, would be even better.

@brijazz
Copy link

brijazz commented Dec 6, 2022

Another vote for expiring mute/filter as @justinferrell mentioned - would be great in helping to avoid "The White Lotus" posts until I've had a chance to watch, dammit!

@johnsturgeon
Copy link

Rolling in here to add my vote for content filters / mute / killfile / etc...

@joeworkman
Copy link

+1 for mute filters as well

@brentsimmons
Copy link
Collaborator

Here’s the thing about filters — they will be bad for performance and memory use.

Right now we can get unread counts for each feed with some very simple and very fast SQL queries. Those queries just return a count, and they don’t need to fetch entire rows and all that content. This is great for keeping NetNewsWire speedy and responsive.

If we add filters, then unread counts need to take into account filtered-out articles, which means the queries we’re using now won’t be accurate. They’d be too high.

So, to add filters, we’ll either have new — and complex and slow — SQL queries, or we’ll have to do this in code by reading in the content of each article row and running the filter on it. Even in the best case scenario (we can still use SQL) this is going to be much slower and use more memory.

That said, it would be a great feature! But it’s very, very difficult to do in a way that wouldn’t degrade performance unacceptably. That’s why we’d probably do smart feeds before we ever get to this.

@ulysseskan
Copy link

What if turning on filters disables unread counts? Some people may value filters more highly than seeing how many articles are new.

You'd get a list of higher quality articles ordered by date. To limit memory impact, only filter on strings in the article title.

@Saklad5
Copy link

Saklad5 commented Jan 13, 2023

What if turning on filters disables unread counts? Some people may value filters more highly than seeing how many articles are new.

You'd get a list of higher quality articles ordered by date. To limit memory impact, only filter on strings in the article title.

I was thinking something similar, but I feel this isn't a priority feature anyway. I certainly don't want to sacrifice the phenomenal performance it currently has for it.

If the "filters" just separate a feed into subsets (presented like feeds in a folder) rather than removing articles entirely, the unread count could simply be present for the entire feed rather than the subsets.

@johnsturgeon
Copy link

johnsturgeon commented Jan 13, 2023

Here’s the thing about filters — they will be bad for performance and memory use.

Caveat, I've not even looked at the code, but wouldn't it be possible to add a flag to the post when it's ingested as 'filtered' and just document the fact that the filter is for 'incoming' posts only?

Then the SQL could select where filtered==false

Maybe even have a checkbox when adding a filter saying "Apply filter to unread articles (note this may take some time if you have a lot of articles)"

Almost like mail rules?

Aaaand.. now that I think about it, maybe mail rules type filtering would really be the solution for filtering out articles, and filtering some in to local folders, instead of smart folders?

@jacobpledger
Copy link

That's kind of how I think of them - like very basic mail rules, without the sorting or trash bin. Feedbin has it where it applies them on fetch (or on demand) and just marks the matches as read or starred.

I'm not sure if that's the kind of filtering being discussed here, as the items still show up in the feed, but you won't see them if you're only looking at unread items.

For me, just being able to automate marking stuff as read is sufficient, although it's admittedly not as cool as the other approaches here of hacking up feeds.

@vwkd
Copy link

vwkd commented Feb 27, 2023

@brentsimmons #1864 (comment)

Here’s the thing about filters — they will be bad for performance and memory use.

I'd be happy with a simple filter that filters the feed while fetching it. It would discard any articles that don't match the criteria and only include articles that match the criteria. The filter would likely sit somewhere between parsing the fetched feed and saving it to the database.

Since only the filtered articles ever end up in the database, the database needs to know nothing about filters and its performance shouldn't be decreased. Much more likely its performance would increase since it now contains fewer articles - and at worst equally many - as before.

As for decreased performance of fetching, a local filter operation over the in-memory data structure of the feed is likely small compared to the existing bottleneck of network round trip time. And even if, I'd be happy to wait a few milliseconds longer at fetch time to have a clean feed.

In JavaScript pseudo code, I'm imagining something like

const feed = await fetchFeed(url);
const articles = parseFeed(feed);
- await saveToDatabase(articles);
+ const articlesCleaned = filterArticles(articles);
+ await saveToDatabase(articlesCleaned);
@core-code
Copy link

+1 - this would be perfect for hiding advertisements and sponsored posts - a simple match/contains filter would be enough for many cases.

@randomsequence
Copy link

Here’s the thing about filters — they will be bad for performance and memory use.

Right now we can get unread counts for each feed with some very simple and very fast SQL queries. Those queries just return a count, and they don’t need to fetch entire rows and all that content. This is great for keeping NetNewsWire speedy and responsive.

How about filters which mark filtered items as read? That way the unread counts wouldn't need any changes, and you'd be still be able to see filtered articles turning off the unread filter in the UI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment