Loading HTML or XML Content in LotusScript over HTTP

Your application needs data that are stored on a web server. If that data is available through a web service your are lucky. Since R8 web service clients are supported in LotusScript. If you want to load data from a URL you are out of luck. Typically you would resort to ActiveX and use the IE component to do the retrieval which introduces 3 evils: a Windows dependency, an IE dependency and an ActiveX dependency. The other way is to use Java, which turns a lot of LotusScript developer off. The solution is to use a ready made library that can wraps all the Java you need into a convenient LotusScript class. The use case I had was to read HTML from a remote site and return a specific table for further processing. So my class has an XPath parameter that allows to slice out some part of the returned HTML. This is how you would use it in LotusScript:

%REM
Agent UpdateHTMLOnChange
Created May 28, 2010 by Stephan H Wissel
Description: Reads all documents that have been flagged
as changed and retrieves the update HTML
%END REM
Option Public
Option Declare

Use "HTTPUpdatesLS"

Sub Initialize
Dim updateClass As HTTPUpdates
Set updateClass = New HTTPUpdates
Call updateClass. UpdatePendingDocuments ( )
Set updateClass = nothing
End Sub

And this is the complete LotusScript class:

%REM
Library HTTPUpdatesLS
Created May 29, 2010 by Stephan H Wissel
Description: Wrapper Class around the LS2J Classes
for HTTP driven updates
%END REM
Option Public
Option Declare

Use "HTTPUpdates"
UseLSX "*javacon"
Use "OpenLogFunctions"
%REM
Class HTTPUpdates
Description: LotusScript Wrapper around
Java Class for HTTPUpdates
%END REM

Public Class HTTPUpdates
'For the LS2JReader
Private jSession As JavaSession
Private httpReaderClass As JavaClass
Private httpReader As JavaObject
Private s As NotesSession
Private db As NotesDatabase
Private viewName As String
Private serverName As String
Private dbName As String

%REM
Sub new
Description: Initialize the class
%END REM
Public Sub New
Call populateDefaults ( )
End Sub

%REM
Sub Crap
Description: Comments for Sub
%END REM
Public Sub UpdatePendingDocuments

Dim fullURL As String
Dim v As NotesView
Dim doc As NotesDocument
Dim nextDoc As NotesDocument

Set db = s. Currentdatabase
Set v = db. Getview ( "pendingHTMLUpdates" )

'Now through the view

Set doc = v. Getfirstdocument ( )

Do Until doc Is Nothing
Set nextDoc = v. Getnextdocument (doc )

Call updateDocHTML (doc )

Set doc = nextDoc
Loop

'And close it down
Call httpReader. recycle ( )

End Sub

%REM
Sub updateDocHTML
Description: Here goes the update of one individual document
%END REM
Public Sub updateDocHTML (doc As NotesDocument )

'We don't let one error derail us
On Error GoTo Err_updateDocHTML

Dim unid As String
Dim result As String
Dim htmlItem As NotesItem

unid = doc. universalid
'Here all the magic happens
result = httpReader. getDocument (unid )

If result <> "" Then 'We only save if we got something
If doc. hasItem ( "FinalHTML" ) Then
Call doc. removeItem ( "FinalHTML" )
End If
Call doc. replaceItemValue ( "FinalHTML",result )
Call doc. replaceItemValue ( "HTMLStatus", "1" )
Call doc. save ( True, True )
End If

Exit_updateDocHTML:
Exit Sub

Err_updateDocHTML:
Call logErrorEx ( Error$,SEVERITY_HIGH,doc )
Resume Exit_updateDocHTML

End Sub

%REM
Sub updateDocHTML
Description: get arbitrary HTML
%END REM
Public Function getRemoteHTML (url As String ) As String

'We don't let one error derail us
On Error GoTo Err_getRemoteHTML

getRemoteHTML = httpReader. getURL (url )

Exit_getRemoteHTML:
Exit Function

Err_getRemoteHTML:
Call logErrorEx ( Error$,SEVERITY_HIGH, nothing )
getRemoteHTML = "<h3>Error:"+ Error$+ "</h3>"
Resume Exit_getRemoteHTML

End Function

%REM
Sub populateDefaults
Description: Populate default setting
presuming the Notes server is known with
its Common name in the DNS
%END REM
Private Sub populateDefaults
Set s = New NotesSession
Set db = s. Currentdatabase

If db. Server = "" Then ' it runs local
servername = "http://localhost"
Else
servername = me. GetServerURL (db. Server )
End If

'We use the replicaid to be save from
'moving of databases and the peril of local nsf names
dbName = "__"+db. Replicaid+ ".nsf"
viewname = "0"

'Create an HTTPReader Instance
Set jSession = New Javasession
Set httpReaderClass = jSession. getClass ( "org.lotususers.tools.HTTPReader" )
Set httpReader = httpReaderClass. CreateObject ( )

Call httpReader. setServerURL (servername )
Call httpReader. setDatabaseURL (dbName )
Call httpReader. setViewName (viewname )
Call httpReader. setXPath ( "//body/*" )
Call httpReader. setUseSSO ( False )
'ToDo: Username & Password from a profile - don't hardcode here!
If Not db. Server = "" then
'Call httpReader.setUserName("user")
'Call httpReader.setPassWord("password")
End if

End Sub

' Cleanup
Public Sub Delete
On Error Resume next
If Not me. httpReader Is Nothing Then
Call httpReader. recycle ( )
End If

End Sub

%REM
Function GetServerURL
Description: Gets the server name from the NAB
%END REM
Private Function GetServerURL (serverName As String ) As String
Dim domDir As NotesDatabase
Dim sView As NotesView
Dim doc As NotesDocument
Dim n As NotesName
Set domDir = New NotesDatabase (serverName, "names.nsf" )
If Not domDir. Isopen Then
Call domDir. Open ( "", "" )
End If
If Not domDir. isOpen Then
Set n = s. Createname (serverName )
GetServerURL = "http://"+n. Common
Exit function
End If

Set sView = domDir. Getview ( "($Servers)" )
Set doc = sView. Getdocumentbykey (serverName, true )
If Doc Is Nothing Then
Set n = s. Createname (serverName )
GetServerURL = "http://"+n. Common
End If

GetServerURL = "http://"+ doc. Getitemvalue ( "NetAddresses" ) ( 0 )

End Function

End Class

Of course that class is useless without the Java class. Here you go: HTTPReader.java
You need some libraries as dependencies:

Apache HTTP Client
Apache Commons logging (a HTTP client dependency)
HTML Cleaner which ensures that HTML you read is transformed into valid HTML so it can be parsed

As usual YMMV.

Posted by Stephan H Wissel on 01 December 2010 | Comments (6) | categories: Show-N-Tell Thursday

posted by Peter on Thursday 02 December 2010 AD:
Nice work, but don't forget that you can get a web page through pure LS too: { Link }

/Peter

posted by Peter on Thursday 02 December 2010 AD:
Do you have a link to some technote or something saying that the web retriever has been discontinued? I'm still using it in 8.5.2...

I definately agree that there are several disadvantages in using it, but it also has some advantages.

posted by Stephan H. Wissel on Thursday 02 December 2010 AD:
@Peter
the web retriever task has been discontinued in R7, the code breaks on clients easily, you need to modify your database design, doesn't work for XML and doesn't allow clipping.

Too many disadvantages for pure LS. Even IBM suggested Java. I like the HTTP Client over bare bone URLConnection since it handles cookies, signon and redirections.

posted by Stephan H. Wissel on Friday 03 December 2010 AD:
The task is still there (fiercely backward compatible) but hasn't got any upgrade for a long long time. That's what I meant with "discontinued"

posted by Hinshaw on Sunday 05 December 2010 AD:
Awesome work,i think that there is nothing can be perfectible.

posted by Mark Haller on Wednesday 07 November 2012 AD:
Hi Stephan

Love this post. Is this still your preferred method for retrieving a webpage in LS? Reason I ask - I'm fed up trying to fix msxml3 errors on my servers and need an alternate after using MSXML for a loooong time!

Just followed you on Twitter (@LogicSpot)

Would love some help soonest! Emoticon smile.gif

Mark