Myth Buster: NSF doesn't scale

In a lot of customer discussions I hear: "Oh we need to go RDBMS since Notes' NSF doesn't scale". Which is an interesting statement. When digging deeper into that statement, I get a number of reasons why they believe that:

Somebody told them
They had a problem in R4
The application got slower and slower over the years (and yes it's that same old server
The workflow application using reader fields is so slow
They actually don't know

Then I show them this:
More than 1.6 Million documents in a NSF

(atually not the graphic, but the live property box using my Notes client). The question quickly arises how the acceptable performance of such a database can be achieved. There are a few pointers to observe:

Watch your disk fragmentation (troubleshooting tip on the Notes and Domino wiki)
Be clear about your reader and author fields usage. In case the RDBMS fans insist on their solution, ask them to build the reader field equivalent in RDBMS and measure performance then.
Watch your view selection formulas carefully. (You don't use @Now, @Today, @Yesterday or @Tomorrow do you?)
You want to use DAOS to keep attachments out of the NSF (helps with fragmentation) -- don't forget to buy a disk for your transaction log.

As usual YMMV.

Posted by Stephan H Wissel on 23 January 2010 | Comments (6) | categories: Show-N-Tell Thursday

posted by Vince Schuurman on Sunday 24 January 2010 AD:
Currently at 2.4 million, but we have to be very careful with mass changes on data because the server crawls when the indexer runs.

posted by Darren Oliver on Sunday 24 January 2010 AD:
We have Notes databases of a similar size, however we find that in the case of a view corruption, it can take over 6 hours to rebuild one view. While it does't have any date formulas it does have multiple sortable columns, which obviously increases the size of the index (and therefore the time taken to rebuild).

Because of this, we are in the middle of reviewing the design - not moving away from Notes but rather looking at the data and working out ways to either archive the data or provide different ways to access the specific data they want.

While others would look at our architecture and say "Notes doesn't work - let's go to a RDBMS", we're saying, lets look at the architecture and the data and work out how we can improve and optimise it.

I'm certainly not anti-RDBMS, and a portion of our new application will be Ruby on Rails/MySQL, but with any system it's worth looking into the data and how and why its used and determine the best way of getting access to it.

Just my two cents worth...

posted by Nathan T. Freeman on Sunday 24 January 2010 AD:
pish posh. Let me know when you break 2 million documents. Then we'll be in the same ballpark. At 5 million, I'll get excited.

Of course, it helps to know how to do more with just one index, instead of the 150 that most Notes apps tend to have.

posted by tom oneil on Monday 25 January 2010 AD:
We let an archive accidentally build to 4.3 million records.

Granted, this database is not used for transactions, it gets data dumped in weekly... but it runs fine.

posted by Darren Oliver on Monday 25 January 2010 AD:
Our largest is only 1.5 mill but it takes up 15.2Gb. The view indexes (indices?) are over 10Gb of that space (although the view index dialog only says 2Gb - not sure what's up with that) - check out { Link }

This is perhaps what Nathan was talking about - multiple views doing the same or similar thing. Something we'll be looking at in the design review.

posted by Erik Brooks on Saturday 30 January 2010 AD:
NSF scales fine. There's three pieces that don't:

1. Readers' fields (you've covered this)
2. Complex or time-based views
3. Any self-written pseudo-indexing code (e.g. JOINs, or if you wrote code to allow your users to sort a view by category totals.) If your indexing options can't be specified in a view you're stuck writing script/java/ssjs code to do really heavy crunching of data sets.

I've got several DBs with anywhere from 3mil to 10mil+. The biggest have views set to "manual" and are fairly static, and those fly.

But I've also got DBs with a mere 50,000 docs that fall under #3 and they're slow in comparison.