ORBIT's PowerSearch: a powerful tool for patent searching. (online database
searching)
Cloutier, Kathleen A.; Kaback, Stuart M.
May, 1994
patent information is the primary focus of the ORBIT Online Service
[1,2]. ORBIT's new PowerSearch capabilities reflect its understanding of
the unique aspects of patent searching. PowerSearch is a set of commands
that helps a user customize the searching environment. searcherscan now
blend results from single or multifile searches, group patent documents
dealing with the same invention, and flexibly define the concept of a
duplicate for both bibliographic and patent records.
MULTIFILE searching
As a tool for searching multiple files, PowerSearch extends the
utility of DIALOG's OneSearch and Data-Star's StarSearch. OneSearch was
introduced in 1987. It searches an entire set of databases at once. This
approach is satisfactory for some types of searches, such as an author or
specific term search. Other searches must exploit the unique features of a
particular database, and this is very difficult to do in a OneSearch
environment.
StarSearch was released in 1991. Data-Star recognized the user's need
to manipulate individual databases in different ways. StarSearch
sequentially searches each database of interest, either with the previous
strategy or a new one. Ann Van Camp [3] compared the two approaches,
beginning her article by stating, 'Multifile searching just got easier.'
Now it is even easier. ORBIT's PowerSearch approach allows the user
to expand or narrow the set of databases searchED at any point during a
multifile session. The results from any search statements can be combined
as desired. As we shall see, this ability to control and flex the search
environment is very valuable.
STN has introduced a similar approach. The MESSENGER software has
always kept the results of all search statements from all the files used in
a session. Users can easily go back and forth between files using previous
search strategies and results as desired. In late 1992, the FILE command
was expanded to permit simultaneous use of multiple files. As in
PowerSearch, it is possible in STN to continually change the file domain
and later merge the results from the sets of interest.
DUPLICATE DETECTION
AND GROUPING
ORBIT's PowerSearch excels in its definition of a patent 'duplicate.'
In 1989, DLALOG became the first databank to provide bibliographic
duplicate detection. As Carmen Miller [4] noted, it was 'a searchER'S dream
come true.' DIALOG has never extended this capability to patent searching.
STN's duplicate identification includes patents. Unfortunately, duplicates
in this system are defined in terms of documents, rather than inventions.
The concept of an invention as a set of related documents is ignored.
Derwent's World patents Index, a key patent database, is available on STN
but is not included in the set of files with duplicate detection
capability.
PowerSearch can create patent and bibliographic groups of records
that describe the same invention or source document. A group contains a
'first record' (decided by file order) against which related and duplicate
records are determined. The searchER can print all members of each group,
just the first record from each group, or exclude duplicates or print
duplicates only.
ORBIT defines a duplicate patent record as one that 'represents the
same patent publication and contains the same priority or application
data.' Related records 'reference the same invention and contain additional
patent, priority or application number information' [5]. The duplicate
detection for a patent number links the nine-character Derwent format only.
The kind or status codes associated with it are ignored. Thus, a published
European patent application is considered a duplicate of the granted
European patent for the same invention.
The grouping capability is extremely useful in pulling together data
about an invention. Combining fields from a variety of databases leads to a
more useful set of information. Details of patent claims, indexing and
family members can be merged without extensive POST-search editing. The
expanded data can be helpful in screening results for relevancy and
providing needed legal information. Some users will find that they never
want to exclude duplicates, but will often prefer to group all of their
results together.
PowerSearch grouping for patents was released before bibliographic
grouping. A few files are available for bibliographic grouping now. Other
files with this capability are being added. In a bibliographic group,
citations with the same author and title elements are arranged together in
a print display. Duplicates are defined as having the same source as well.
Related records have different sources. Thus, records for a journal article
and the related conference citation are printed together when they have
similar titles.
THE COMMANDS OF
POWERSEARCH
How does ORBIT accomplish all of this? The 16-page PowerSearch System
Reference Guide describes the commands. Many of the new multifile functions
are expansions of older ORBIT commands. The FILE command can now be
followed by up to 40 filenames. As before, FILE automatically closes all
previously opened files and erases all existing search results. PowerSearch
should be viewed as a single 'FILE' session.
The ADD and REPEAT commands open additional files for searching. ADD
simply makes more files available. REPEAT not only adds files, but
automatically executes all or part of the existing search strategy in them.
Results from the new databases are automatically incorporated into the
existing results. The NEW command is used to restrict searching to a single
file or set of files. A NEW command puts all previously used files in an
'inactive' state. NEW must be used with a single file when using the patent
FAMILY command. It is also very useful when using special indexing or other
features unique to a file. DELETE and SUBTRACT are used to narrow the
search environment. search results from different files are combined with
the MERGE command for printing, sorting or grouping. A merged set cannot be
used for searching, but the REPEAT command and Boolean operations on sets
from the same files can be used instead.
Many other commands have been adapted for PowerSearch. Cross-file
searching and statistical analysis (PRINT SELECT and GET) can be used with
multiple files. The PRINT command can either display records in file order,
or SORT can be used to rearrange records by a field. The HISTORY command
provides a summary of the results for each search statement along with the
postings from each file searched. FROM may be used after many commands to
restrict the operation to a single file or set of files.
A shortcoming of PowerSearch is that the NBR command for browsing
database indexes has not been enhanced to handle the multifile environment.
It can only be used with one file at a time. NBR defaults to the first
active file. ORBIT gives a very clear message that this has happened. To
search a different file, NBR FROM must be used with the filename. In this
case, DIALOG and STN are much more advanced with their comparable EXPAND
functions.
WHAT DOES POWERsearch
MEAN TO patent
SEARCHERS?
Patent searchersstrive for a reasonable set of search results with
the highest recall possible. Information obtained from the patent
literature is often of major corporate importance. It is rare when a search
run in a single database retrieves all relevant information. Each patent
database deals with a different subset of the world's patent literature and
has a different mix of technical and legal information. Some databases
excel in specialized indexing and others in their organization of the legal
data. The currency of the files also varies considerably. searching
multiple files and blending the results fully uses the strengths of each
database. Adding a file can help narrow search results, find additional
references, or augment the data concerning a particular invention.
To be effective, a patent search must:
1. Use the full capabilities of each database.
2. Use the information found in one database as a search statement in
another.
3. search parts of a strategy in different databases.
4. Deal with results where multiple records represent the same
invention.
5. Review the results to determine which references are relevant to
the original query.
The addition of PowerSearch to ORBIT's searching, cross-file and
statistical capabilities helps the searchER in all five of these functions.
Item I is achieved with the NEW command, which operates in a single file
environment and allows the searchER to focus on special indexing and
features. Cross-file searching accomplishes Item 2 and has been enhanced to
operate on multiple files. The third operation can be performed by
searching several files at once or by using the REPEAT command to rerun all
or part of a previous strategy. A notable (and already mentioned) weakness
is the inability of the NBR command to work on more than one file at once.
Some patent files have many spelling errors that must be incorporated into
a search strategy. Pulling these together with one command would save time
and be extremely helpful.
PowerSearch's ability to group records is a tremendous help for the
fourth item. The searchER must still perform family searching to be sure
that all related records are in the answer sets. Grouping also assists item
5. The juxtaposition of enhanced titles, abstracts and portions of indexing
from several databases makes the screening of results much easier.
Some images from the World patents Index should also be available by
the time this article is published. This addition of chemical structures
from 1992 will be invaluable, as will the availability of electrical images
back to 1988. Perhaps someday the abstracts from Chemical Abstracts will be
available as well.
TIPS FOR USING
POWERSEARCH
Here are some things to keep in mind when searching for patent
information with PowerSearch:
* The FILE command works the same way it always has! FILE is only
used once during a PowerSearch session. FILE erases all search results. If
you have trouble breaking the FILE habit you may want to rename the system
command to avoid mistakes. Use RENAME to change the command for the current
session. To make a permanent change, type:
TERM PROFILE
RENAME FILE TO xxxxx
(Choose a name that is not an ORBIT command. Avoid any word that you
may want to use as a search term.)
RESTART Y
* When you are ready to group documents, the order of the files is
very important. The number of duplicates identified depends on the file
order. The record from the first file is always kept. For example, a search
finds a USPM record citing a U.S. patent and a WPAT record that lists the
same U.S. patent as well as other equivalents. No duplicates will be found
when the file order is USPM, WPAT. If WPAT is the first file, the USPM
record will be identified as a duplicate. REORDER is the command to change
the sequence of files.
* If your search uses Chemical Abstract Registry Numbers, using PRINT
HIT ITT will give the indexing and the roles for them. This extra
information can be very useful in screening results.
* Grouping pulls information about family members together but is not
a family search. It is still necessary to search for the related records.
* Kind codes are not considered unique. Information about a granted
Patent may be lost when duplicates are removed from a set. Remember that
duplicates are removed based on which database is keyed first when entering
the cluster.
* Cross-file searching is a powerful technique that may still be the
most effective way to perform many searches. I found it useful to search
information for use in subsequent files. PRINT SELECT is ORBIT's command
that transfers terms to a SELECT list. Each SELECT list can be named by
using the TOSEL option. It is very helpful to refer to the saved lists by
name. You must use PRINT SELECT if you would like to have a Derwent record
for every citation found in other files.
* XCLAIM is the command to move results from an expensive CLAIMS file
to a less costly one. In general, XCLAIM works very well in PowerSearch. It
appears to work, but fails when the file you move to has already been used
in a session. In this case, the expensive file remains open and is used for
searching. (As a result of reviewing this article prior to publication,
ORBIT is in the process of fixing this problem. - NG)
* MERGE is used for printing. Use REPEAT, or other logic to create
searchABLE sets from multiple files.
* If none of the specified print fields exist for a record, only a
message indicating the problem is given. Printing the item number in the
set in a different format will obtain some information about the missing
record.
* The SUBS (subheading) command remains in effect for the entire
PowerSearch session. (SUBS CANCEL turns it off.)
A CLOSER LOOK AT
POWERSEARCH PATENT
GROUPING
The search in Figure 1 uses several features of PowerSearch to
collect and group a set of patents mentioning minoxidil. The goal of the
search was to obtain a manageably large set of records from several
companies and countries and examine the results. I wanted to obtain
Derwent's family information for every invention found by searches of
Chemical Abstracts, CLAIMS, Current patents, INPANEW and WPAT.
To do this, I extracted the patent, application and priority numbers
from the search results of the various files. The WPAT search combined
these results with the additional records found by a single file search of
WPAT. I extracted and RE-searchED the Derwent priority numbers to find
additional related records. All 518 of the records from all of the files
were then merged. After reordering to make WPAT the first file, an ID
display reported 227 patent groups containing 243 duplicate patent records.
I studied these groups to become more familiar with grouping.
I was impressed by the success of the cross-file searching. The
search of Chemical Abstracts had patents from 13 countries and covered the
time period 1975 through November 1993. All but one patent group had one or
more records from WPAT. The exception was an Australian invention not in
the WPAT database. An INPADOC family search did not show any equivalents in
other countries. Chemical Abstracts and INPADOC are the only ORBIT
databases that cover this invention.
The records in each patent group are sorted in file order. Duplicate
records are clearly labeled. The groups are arranged in reverse
chronological order for the newest record from the first file. All groups
with records from the first file print first. In the minoxidil search, the
last group did not have a WPAT record. Browsing the end of the file makes
these easy to spot. This technique also spotlights non-convention cases
that have not been tied to their convention equivalent, and so helps the
searcher make such connections.
I didn't find anything wrong with the grouping. WPAT records that
were related to other WPAT records by priority number were nicely tied
together. One case even added a WPAT record that didn't have a common
priority or application number. This group is shown in Figure 2. One of the
12 application numbers in the CLAIMS record is the only link to Derwent 89-
070403/10. On their own, the three WPAT records group into two patent
groups. Adding the Chemical Abstracts records does not change this. It's
the CLAIMS record that pulls it all together. The XR cross-reference field
in WPAT for 92- 323296/39 does list both of the other records but the
priority link has been omitted from the database. (This omission is also
present in the DIALOG file.)
Another advantage of the merged results is the ease with which
irrelevant Derwent records found by a search for related documents can be
identified. These records share a common priority number but deal with very
different subject matter and belong to different companies. PRINT PATGROUP
pulls them together.
A further value of grouping is the ability to bring together as much
information about an invention as possible and desired. The differently
enhanced titles from several databases combine with indexing aids in
screening search results. The detailed U.S. application data from CLAIMS is
a useful addition to the data from the other files. I have often manually
edited search reports to obtain similar results. The ability to issue a
command and have the editing automatically performed is very helpful. The
direct benefit of grouping is having fewer separate inventions to screen
for relevancy and more data to help evaluate the results.
WISH LIST
PowerSearch in its initial form has succeeded in assisting patent
searcherswith retrieval and analyses of patent information. Several
enhancements would make it even better:
* The NBR command must be expanded to operate with true multifile
searching. Database indexes have many inconsistencies such as spelling,
terminology, inventor and assignee names. It is inefficient to specify each
file separately. Providing a multimeaning display for use with truncation
in multifiles would be an acceptable patch for the short term. (This would
work like the ROOT command on other systems.) However, NBR is much more
desirable.
* It would be extremely useful to see grouping evolve so the user can
specify different print fields for each database. ORBIT has indicated that
it is heading in this direction. The next step should be to make it
intelligent and flexible. I do not want repeated copies of the same
assignee information for an invention, but I also do not want to miss cases
where it varies. (I was able to print different formats for each database
in an STN multifile session. I created a format in each file and saved it
under the same name.)
* Can cross-file searching be streamlined or customized? I grew
accustomed to typing PRT SEL SET PN, PR, AP TOSEL xxxx but often made
typing errors.
* Including cross-file commands in HISTORY would be valuable.
CONCLUSIONS
In 1983, Stu Kaback [6] fantasized about a master file of patent
information incorporating the inputs of CAS, Derwent, API and IFI. At that
time, cross-file operations could be used to perform some aspects of this
concept. In 1987, when OneSearch had just been announced he mused:
Now, file cluster searching, or OneSearch, is likely to save time
used in searching multiple files one after the other, but as soon as I saw
it, I wondered about a possible enhancement. When you're cluster searching,
each of the records is merely the record from an individual database. What
about the possibility of an algorithm that would tell the computer to
combine all records with a common bibliographic element, most probably the
priority number for patent files. Then each patent would have in one
superrecord the Derwent codes, the CAS Registry numbers, the CLAIMS-CDB
terms and roles, the API terms, the WPI family, everything . . . Here one
would have the best of all worlds, in which every patent family would be
searchABLE by ALL of the attributes assigned to it by all databases. [7]
ORBIT's ability to group duplicate and related patent records from
various databases is another significant step toward this ideal world. No
other online service has successfully brought patent records dealing with
the same invention together. The searchER still needs to do meticulous
cross-file searching but the final printout organizes the results in a
useful way that previously required extensive post-processing to
accomplish. The 'super record' concept for searching and displaying is not
fully implemented, but it is getting closer. For many patent searches,
ORBIT should be the databank of choice.
Cross-file searching
And implied File
Mergers
Cross-file searching should not be confused with multifile searching.
In multifile searching a group of files may be searched simultaneously, but
each file is queried with only its own parameters, or with a common set of
parameters. In cross-file searching, parameters extracted from one file are
brought to bear against a second file, enabling the selection of those
references that include both Term A from File AA and Term B from File BB,
thus allowing the viewpoints of the two files to interact. Multifile
searching is certainly convenient, but cross-file searching is a far more
powerful technique.
File mergers to produce the utopian All-the-Parameters file of my
fantasies are unlikely to happen. The WPI-APIPAT merger gave us a wonderful
product, but the experience of ORBIT, Derwent, and API in combining these
databases showed that such mergers are highly complicated (read costly),
even in a situation that seemed made-to-order for such a merger. But
PowerSearch can, in some instances, permit virtual file mergers, a concept
I first heard enunciated by Nancy Lambert of Chevron.
File BB with a set of parameters including Term B. Merge the two sets and
PRINT GROUPS. Those groups that include a reference from File AA and its
equivalent from File BB turn out to include both Term A and Term B. It's
very much like a merger of the two files - Nancy's virtual merger.
The concept has tremendous potential for high-powered searching.
ACKNOWLEDGMENT
I want to thank ORBIT and STN for allowing me to freely explore their
systems. A special thanks to Sarah Radick of ORBIT for her helpfulness and
quick response to all of my thoughts and questions.
REFERENCES
[1] Gregory, Andrew. 'Maxwell Online And The Future . . . InfoPro
Technologies.' ONLINE 16, No. 6 (November 1992): pp. 10-11.
[2] InfoPro Technologies announced the sale of ORBIT Online Service to
Questel on February 1, 1994. See the newspages in ONLINE, March 1994, p.
13.
[3] Van Camp, Ann J. 'StarSearch For The Health Sciences.' DATABASE
14, No. 5 (October 1991): pp. 99-101.
[4] Miller, Carmen. 'Detecting Duplicates: A searchER'S Dream Come
True.' ONLINE 14, No. 4 (July 1990): pp. 27-34.
[5] PowerSearch System Reference Guide (November 1993).
[6] Kaback, Stuart M. 'Online patent searching: The Realities.' ONLINE
7, No. 4 July 1983): pp. 22-31.
[7] Kaback, Stuart M. 'What's News?' World patent Information 9, No. 4
(1987): pp. 258-259.
|