Doug Steele Gold Collection
![]()
In the spirit that "everything is data," Doug Steele looks at how to build a database of favorites (bookmarks) from data extracted from several different Web browsers. Along the way, he provides a very efficient routine for processing files and a number of helper functions for processing files (including INI files).
Is it possible to import my Favorites from Internet Explorer and store them in a table?
For those of you who haven't looked, Internet Explorer stores each entry in your Favorites list as a separate file (with an extension of .url). For example, the favorite I have saved for the Smart Access Web site is named Pinnacle Publishing Smart Access.url, and its content is:
[DEFAULT]
BASEURL=http://www.smartaccessnewsletter.com/ME2/
Audiences/Default.asp
[InternetShortcut]
URL=http://www.smartaccessnewsletter.com/ME2/Audiences/
Default.asp
Modified=B04FC6954E6BC501D9
That means that your question can be viewed as having three separate parts:
1. Find all of the favorites files.
2. Read the contents of each file.
3. Store the information.
How do I find all files meeting a specific criterion?
There are three common approaches to running through files on a hard disk:
• Use the Dir function.
• Use FileSystemObject (FSO) from the Scripting library.
• Use the FindFirstFile, FindNextFile, and FindClose APIs.
One basis for making the decision is which gives the best performance. Tests I did yielded the results shown in Table 1. These results reflect how long it took to find the 5491 files on my hard drive. As you can see, using the API functions is the fastest (no surprise, since the other methods are probably using those functions themselves).
Table 1. Relative speeds of different file enumeration methods.
Method |
Time (seconds) |
Using Dir |
1.222 |
Using FSO |
10.285 |
Using APIs |
0.191 |
In the interest of space, I'm only going to talk about using the Dir function, since it combines efficiency with ease of understanding. (If you want information on the other methods, I wrote an article about these functions for the May 2005 issue of Access Advisor.)
The VBA Dir function "returns a String representing the name of a file, directory, or folder that matches a specified pattern or file attribute, or the volume label of a drive" (to quote the VBA Help file). The first time you call the Dir function, you must provide a string expression that specifies a file name (which may include folder and drive, as well as wildcards). Dir returns the first file name that matches the pattern supplied. To get any additional file names that match the pattern, you call Dir again with no arguments.
The Dir function works fine to list all of the files in a single folder or to get a list of all subfolders in a given folder. The problem, though, is that you can't nest Dir calls, making it difficult to create a recursive function to handle files in subfolders.
One solution to processing subfolders is to store subfolders in a collection or array as the Dir function finds them. When you finish processing a directory, you'll have a list of all of the subfolders in the directory. You can then use the Dir function to find the files in each subfolder. This sounds like a useful routine and, to make it generic, all that's necessary is to have the function accept a starting directory and a file pattern as parameters. My version of the routine uses a collection to hold the list of subfolders and is designed to run recursively. In actual fact, I use two separate collections in the routine: the "main" collection that holds all of the files found and that gets passed between routines, and the "local" collection that's local to the routine that holds the folders found in the currently processed directory. Here's the routine:
Public Sub FindFiles( _
StartDir As String, _
FilePattern As String, _
FileList As Collection _
)
Dim strFile As String
Dim strFolder As String
Dim strSubFolder As String
Dim varFolder As Variant
Dim colSubfolders As Collection
Assuming that a starting directory was passed, I first ensure that the folder name ends in a slash. I've created a helper function called QualifyFolderPath to do this (see the code after this code listing for what the QualifyFolderPath function looks like):
If Len(StartDir) > 0 Then
strFolder = QualifyFolderPath(StartDir)
I now add the name of each file in the folder to the "main" collection:
strFile = Dir$(strFolder & FilePattern)
Do While Len(strFile) > 0
FileList.Add strFolder & strFile
strFile = Dir$
Loop
I then build a list of subfolders, adding each subfolder's name to the "local" collection:
Set colSubfolders = New Collection
strSubFolder = Dir$(strFolder, vbDirectory)
Do While Len(strSubFolder) > 0
If strSubFolder <> "." And _
strSubFolder <> ".." Then
If (GetAttr(strFolder & strSubFolder) _
And vbDirectory) = vbDirectory Then
strSubFolder = strFolder & strSubFolder
colSubfolders.Add strSubFolder
End If
End If
strSubFolder = Dir$
Loop
Finally, I recursively process each of the subfolders found above. Because I'm enumerating the elements of a collection that contains strings rather than objects, I have to use a variant to enumerate the loop. However, the function is expecting a string, so I have to convert the variant back to a string after pulling it from the collection:
For Each varFolder In colSubfolders
Call FindFiles(CStr(varFolder), _
FilePattern, FileList)
Next varFolder
End If
End Sub
As mentioned earlier, here's QualifyFolderPath, which ensures that each folder ends with a slash:
Private Function QualifyFolderPath( _
PathName As String _
) As String
If Len(PathName) > 0 Then
If Right$(PathName, 1) <> "\" Then
QualifyFolderPath = PathName & "\"
Else
QualifyFolderPath = PathName
End If
Else
QualifyFolderPath = ""
End If
End Function
In order to use this routine to determine all of the favorites stored on the machine, you have to know where to look. The folder in which Internet Favorites are stored is a so-called "Special Folder," and there are API calls that can be used to determine the location of any Special Folder. The following code uses the SHGetSpecialFolderLocation and SHGetPathFromIDList API calls to return the location of the user Favorites folder (which, in Windows XP, is usually going to be C:\Documents and Settings\<user id>\Favorites):
Private Const CSIDL_FAVORITES = &H6
Private Const MAX_PATH = 260
Private Const S_OK = 0
Private Declare Function SHGetPathFromIDList _
Lib "shell32" Alias "SHGetPathFromIDListA" ( _
ByVal pidl As Long, _
ByVal pszPath As String _
) As Long
Private Declare Function _
SHGetSpecialFolderLocation Lib "shell32" ( _
ByVal hwndOwner As Long, _
ByVal nFolder As Long, _
pidl As Long _
) As Long
Private Declare Sub CoTaskMemFree Lib "ole32" _
(ByVal pv As Long)
Function GetFavoritesFolder() As String
Dim strPath As String
Dim lngPIDL As Long
If SHGetSpecialFolderLocation(0, _
CSIDL_FAVORITES, lngPIDL) = S_OK Then
strPath = Space$(MAX_PATH)
If SHGetPathFromIDList( _
ByVal lngPIDL, ByVal strPath) Then
GetFavoritesFolder = _
Left(strPath, InStr(strPath, Chr(0)) - 1)
End If
Call CoTaskMemFree(lngPIDL)
End If
End Function
Now that you have a way of determining where to start looking for the favorites, you can use the FindFiles routine using code like this:
Public Sub TestFind()
Dim colFiles As Collection
Dim intLoop As Integer
Dim strStartDir As String
Dim strMessage As String
Dim varFile As Variant
strStartDir = GetFavoritesFolder()
Set colFileList = New Collection
Call FindFiles(strStartDir, "*.url", colFiles)
If colFiles.Count = 1 Then
strMessage = "There is 1 file under "
Else
strMessage = "There are " & _
colFileList.Count & " files under "
End If
strMessage = strMessage & strStartDir
Debug.Print strMessage
For Each varFile In colFileList
Debug.Print varFile
Next varFile
Set colFiles = Nothing
End Sub
How can I read the contents of a Favorites file?
Take another look at the contents of Pinnacle Publishing Smart Access.url that I showed earlier. It turns out that the file can contain a lot of different pieces of information in it, but the relevant part for our exercise (the actual URL associated with the favorite) is the following:
[InternetShortcut]
URL=http://www.smartaccessnewsletter.com/ME2/Audiences/
Default.asp
You may notice that that this file's format is identical to an INI file, a fact that we can use to our advantage, since Windows' GetPrivateProfileString API is designed for reading INI files.
The parts of an INI file are:
[section]
key=value
The declaration for the GetPrivateProfileString is:
Private Declare Function _
GetPrivateProfileString Lib "kernel32" _
Alias "GetPrivateProfileStringA" ( _
ByVal lpSectionName As String, _
ByVal lpKeyName As Any, _
ByVal lpDefault As String, _
ByVal lpReturnedString As String, _
ByVal nSize As Long, _
ByVal lpFileName As String _
) As Long
You'll need to supply values for all but lpReturnedString:
• lpSectionName–The section from the INI file schematic shown earlier ("InternetShortcut" in the case of the Favorite file).
• lpKeyName–The key from the INI file schematic shown earlier ("URL" in the case of the Favorite file).
• lpDefault–What value to use if the section/key combination isn't found in the file.
• lpReturnedString–A buffer that's been initialized to a fixed number of spaces that will receive the value of the key in the INI file.
• nsize–The number of spaces to which the buffer has been initialized.
• lpFileName–The name of the INI file (in our case, the name of the Favorite file).
If the function call succeeds, the function returns the number of characters copied to the buffer, not including the terminating null character.
So, to read the value of the URL key in the InternetShortcut section of a Favorites file, you'd use code similar to this:
Function GetURL(FileName As String) As String
Dim lngSize As Long
Dim lngSuccess As Long
Dim strReturn As String
strReturn = Space$(2048)
lngSize = Len(strReturn)
lngSuccess = GetPrivateProfileString( _
"InternetShortcut", "URL", _
"", strReturn, lngSize, FileName)
If lngSuccess > 0 Then
GetURL = Left$(strReturn, lngSuccess)
End If
End Function
How can I store the Favorites information?
What I did was design a table that could hold the site information. Much to my surprise, I discovered that some of the URLs associated with favorites were in excess of 255 characters, so I was forced to use a Memo field to hold the URL. The fields I used in my table are listed in Table 2.
Table 2. My "Favorites" table.
Field name |
Field type |
FavoriteFolderNm |
Text(255) |
SiteNm |
Text(255) |
URLTx |
Memo |
I decided that I wouldn't store the entire name of the Favorites files. Instead, I'd store only that part of the path after whatever was returned by GetFavoritesFolder. In other words, if my favorites are stored under D:\Documents and Settings\DJSteele\Favorites and I have a specific Favorite stored in D:\Documents and Settings\DJSteele\Favorites\Microsoft\MSDN Advanced Search.url, I'll store Microsoft as the FavoriteFolderNm, and MSDN Advanced Search as the SiteNm.
Having said all that, here's code that can find all of the favorites and store them in the database. The routine is passed the starting directory to search for the Favorites file:
Public Sub FindFavorites( _
Optional StartDir As String = "" _
)
Dim colFavorites As Collection
Dim lngStartFolderLength As Long
Dim strFile As String
Dim strFolder As String
Dim strStartFolder As String
Dim strSQL As String
Dim varFile As Variant
I then make sure that the program knows where to look for the favorites files (and that the starting folder ends with a back slash):
If Len(StartDir) = 0 Then
strStartFolder = _
QualifyFolder(GetFavoritesFolder())
Else
strStartFolder = QualifyFolder(StartDir)
End If
lngStartFolderLength = Len(strStartFolder) + 1
With the initialization just about done, I then create a new collection to hold the file names, and call FindFiles, passing it the appropriate values to return the list of favorites files:
Set colFavorites = New Collection
Call FindFiles(strStartFolder, "*.url", _
colFavorites)
If I do find some favorites files, I loop through them. For each entry in the collection of Favorite files, I separate the full path into the Folder and File names, stripping off the "unique" part of the folder, and removing the .url from the end of the file name. Now that I have the file name, I retrieve the URL from within the file using the GetURL function shown earlier, and use a SQL Insert Into statement to store the values in my table:
If colFavorites.Count > 0 Then
For Each varFile In colFavorites
strFile = Dir$(varFile)
strFolder = Mid$(Left$(varFile, _
Len(varFile) - Len(strFile)), _
lngStartFolderLength)
strFile = Left$(strFile, Len(strFile) - 4)
If Right$(strFolder, 1) = "\" Then
strFolder = Left$(strFolder, _
Len(strFolder) - 1)
End If
strSQL = "INSERT INTO Favorites ( _
FavoriteFolderNm, SiteNm, URLTx) " & _
"VALUES(" & FixText(strFolder) & ", " & _
FixText(strFile) & ", " & _
FixText(GetURL(CStr(varFile))) & ")"
CurrentDb.Execute strSQL, dbFailOnError
Next varFile
End If
End Sub
Here again, I use another helper function–FixText, which ensures that the quotes in the SQL statement are appropriate:
Private Function FixText( _
InputText As String, _
Optional Delimiter As String = "'" _
) As String
Dim strTemp As String
strTemp = Delimiter
strTemp = strTemp & Replace(InputText, _
Delimiter, _
Delimiter & Delimiter)
strTemp = strTemp & Delimiter
FixText = strTemp
End Function
What about Favorites (or Bookmarks) for browsers other than Internet Explorer?
Firefox and Netscape essentially store their bookmarks as a Web page. Here's an extract of a Firefox bookmark file (yes, even though it says !DOCTYPE NETSCAPE-Bookmark-file-1, it is Firefox!):
<!DOCTYPE NETSCAPE-Bookmark-file-1>
<!-- This is an automatically generated file.
It will be read and overwritten.
DO NOT EDIT! -->
<META HTTP-EQUIV="Content-Type"
CONTENT="text/html; charset=UTF-8">
<TITLE>Bookmarks</TITLE>
<H1 LAST_MODIFIED="1117918695">Bookmarks</H1>
<DL>
<p>
<DT>
<H3 ADD_DATE="1102425622"
ID="rdf:#$j.0eT1">Dell</H3>
<DL>
<p>
<DT>
<A HREF="http://www.dell.com/"
ADD_DATE="1102425622"
ID="rdf:#$l.0eT1">Dell</A>
<DT>
<A HREF="http://support.dell.com/"
ADD_DATE="1102425622"
ID="rdf:#$n.0eT1">Support.Dell.com</A>
</DL>
<p>
<DT>
<A HREF="http://communities.microsoft.com/
newsgroups/default.asp?icp=whidbey&
slcid=us"
ADD_DATE="1102425622" LAST_VISIT="1115580162"
LAST_CHARSET="ISO-8859-1"
ID="rdf:#$v.0eT1">Microsoft Newsgroups</A>
<DT>
<A HREF="http://www.microsoft.com/downloads/
details.aspx?FamilyId=FE118952-3547-420A-A412
-00A2662442D9&displaylang=en"
ADD_DATE="1102425622"
ID="rdf:#$z.0eT1">Office 2003 XML Schemas</A>
</DL>
<p>
Each bookmark (or favorite) has its information stored in an HTML Anchor element, with the HREF element containing the URL, and what's between the <A> and </A> being the name of the bookmark. While a number of different elements may be included in the <A> tag, I'm only concerned with the value of the HREF element.
As before, that means that the problem can be viewed as having three separate parts:
1. Find all of the Anchor elements in the file.
2. Determine the contents of each Anchor element.
3. Store the information.
Reading the contents of a file into a variable is straightforward: Just determine the size of the file (using the FileLen function) and initialize a buffer to that size. You can then use the VBA Open statement to open the file for input, the VBA Get statement to read the contents of the file into a variable, and the VBA Close statement to close the file when complete, as this code does:
Function GetContentsOfFile( _
WhatFile As String _
) As String
Dim intFile As Integer
Dim lngFileSize As Long
Dim strContents As String
If Len(Dir(WhatFile)) > 0 Then
lngFileSize = FileLen(WhatFile)
strContents = Space(lngFileSize)
intFile = FreeFile()
Open WhatFile For Binary As intFile
Get #intFile, , strHTML
Close #intFile
End If
GetContentsOfFile = strContents
End Function
Once the contents of the file have been stored in a variable, the Split function can be used to divide the HTML into the various Anchor elements by splitting on every occurrence of <A. What's in each element of the resultant array will have to be parsed again into the URI (Uniform Resource Identifier, the official name for what's stored as the HREF element) and the Anchor value (the text between <A ...> and </A>).
To pull out the data, I created two more "helper functions." The first is GetHREF, which extracts the URI. It begins by declaring some variables:
Private Function GetHREF( _
Anchor As String _
) As String
Dim lngStart As Long
Dim lngEnd As Long
Dim lngTotalLength As Long
Dim strDelimiter As String
Dim strHREF As String
I then check to see whether the expression HREF= occurs in Anchor text:
lngTotalLength = Len(Anchor)
lngStart = InStr(1, Anchor, _
"HREF=", vbTextCompare)
The URI must be delimited after HREF=. That delimiter might be either a single quote (', or Chr$(39)) or a double quote (", or Chr$(34)). However, there can also be white space after HREF=, so I need to handle that possibility. I start by looking at the character after the = sign, which is five positions after lngStart, and then keep looking until the first ' or " is found:
If lngStart > 0 Then
lngStart = lngStart + 5
strDelimiter = Mid(Anchor, lngStart, 1)
Do While strDelimiter <> Chr$(39) And _
strDelimiter <> Chr$(34)
lngStart = lngStart + 1
If lngStart > lngTotalLength Then
strDelimiter = vbNullString
Exit Do
End If
strDelimiter = Mid(Anchor, lngStart, 1)
Loop
Assuming a delimiter was found, I then look for the next occurrence of the delimiter that I found at the start. Whatever is between is the URI:
If Len(strDelimiter) > 0 Then
lngEnd = InStr(lngStart + 1, _
Anchor, strDelimiter, vbTextCompare)
If lngEnd > 0 Then
strHREF = Mid(Anchor, lngStart + 1, _
lngEnd - lngStart - 1)
End If
End If
End If
GetHREF = strHREF
End Function
My second helper routine is GetLinkName, and it finds the name of the favorite. Determining the actual name of the link in the Anchor element is fairly straightforward. I find where the first > character occurs in the Anchor element, and then find where the closing </A> occurs. What's between those two positions is the name of the link:
Private Function GetLinkName( _
Anchor As String _
) As String
Dim lngStart As Long
Dim lngEnd As Long
Dim strDelimiter As String
Dim strLink As String
lngStart = InStr(1, Anchor, ">")
If lngStart > 0 Then
lngEnd = InStr(lngStart + 1, HTML, "</A>")
If lngEnd > 0 Then
strLink = Trim$(Mid(HTML, lngStart + 1, _
lngEnd - lngStart - 1))
End If
End If
GetLinkName = strLink
End Function
Using these various building blocks, it's possible to read the bookmark file and store its content in my table:
Sub ReadBookmarkFile(BookmarkFile As String)
Dim lngLoop As Long
Dim strCurrElement as String
Dim strHREF As String
Dim strHTML As String
Dim strName As String
Dim strOutput As String
Dim strSQL As String
Dim varElements As Variant
I'm a suspicious guy, so I first make sure that a valid Bookmark file was actually passed. I have to check for both the length of the value and the length returned by the Dir function. If an empty string is passed, then Dir("") will give you a file in the current directory, which isn't what I want:
If Len(BookmarkFile) > 0 Then
If Len(Dir(BookmarkFile)) > 0 Then
Once I get the contents of the file, using one of my helper functions, I split it into the contents into individual Anchor elements:
strHTML = GetFileContent(BookmarkFile)
varElements = MySplit(strHTML, "<A", _
-1, vbTextCompare)
For each Anchor element found, I use my last two helper functions to get the favorite's URL and the name of that URL. I then write that information to my table:
For lngLoop = LBound(varElements) To _
UBound(varElements)
strCurrElement = varElements(lngLoop)
If InStr(1, varElements(lngLoop), _
"</A>", vbTextCompare) > 0 Then
If InStr(1, strCurrElement, _
"HREF", vbTextCompare) > 0 Then
strHREF = _
GetHREF(strCurrElement)
strName = _
GetLinkName(strCurrElement)
strSQL = "INSERT INTO Favorites " & _
"(FavoriteFolderNm, SiteNm, " & _
"URLTx) " & _
"VALUES(" & _
FixText(BookmarkFile) & ", " & _
FixText(strName) & ", " & _
FixText(strHREF) & ")"
CurrentDb.Execute strSQL, _
dbFailOnError
End If
End If
Next lngLoop
End If
End If
End Sub
And I'm not done yet–but I've run out of space. Next month, I'll address another Web browser (Opera) and show how to read links from Internet Web pages that have collections of URLs.
Downloads
Your download file is called 508STEELE.ZIP in the file SA2005-08down.zip
This is found in the Gold Collection at http://www.vb123.com/smart/