Download
the files: IPCSICFILES.zip
As part of my research, I have constructed a
concordance that links the International Patent Classification (IPC) system to the U.S. Standard Industrial
Classification (SIC) system at the four-digit SIC level. Although
This document is intended to enable scholars to use
the concordance to further their research.
It describes the text file that contains the concordance,
IPCSICFINALv5.txt. Specifically, it
defines the fields in this file, and provides a “heads-up” for some of the
peccadilloes associated with these data.
Although this document explains briefly how I
constructed the concordance file, it is not intended to provide a full-blown description
of the theory and methodology underlying the concordance construction. For more detail, please refer to chapter 4 of
Silverman (2002).
I have also uploaded the intermediate files used in
the creation of IPCSICFINALv5.txt, in case interested scholars wish to check or
change anything. PLEASE LET ME KNOW IF
YOU MAKE CHANGES TO YOUR VERSION OF THESE FILES, AS I WOULD LIKE TO KEEP TRACK
OF ANY FORKS IN THE DATA! Thanks.
Here are the steps I performed to create
IPCSICFINALv5.txt:
1.
Identify all patents in the Canadian Patent Office’s PATDAT database whose International Patent Classes (IPCs)
were assigned according to version five of the IPC system. These are virtually all patents granted
between 1990 and 1993. There are about
148,000 patents in this population.
2.
Each patent in the population has a primary SIC of Use and a primary SIC of
Manufacture that is assigned by Canadian Patent Examiners who have been
specially trained by people in Statistics Canada. For information on the assignment of SIC
codes to patents in the PATDAT database, see Ellis (1981). [Note #1: A small
number of patents do not have an SIC of Use of SIC of Manufacture assigned, and
are excluded from this concordance. Note #2: The Canadian Patent Office ceased
assigning SICs to patents after 1993.]
3.
For each IPC c, I constructed a frequency distribution reflecting the
proportion of patents assigned to IPC c that were assigned to industry j, j =
Canadian four-digit SIC class. I
constructed separate frequency distributions for SICs
of Use and SICs of Manufacture. These are recorded in the files IPCSICUv5.txt
and IPCSICMv5.txt. The formats are:
IPC – IPC class and subclass
[e.g., A01B]
CSIC – Canadian SIC [e.g., 2319]
UFREQUENCY–
USICPATS/UTOTPATS
USICPATS – Total patents assigned to this IPC and
assigned to this SIC of Use
UTOTPATS – Total patents assigned to this IPC,
excluding patents not assigned to any SIC of Use
TOTPATS2 – Total patents assigned to this IPC
IPC – IPC class and subclass
[e.g., A01B]
CSIC – Canadian SIC [e.g., 2319]
MFREQUENCY–
MSICPATS/MTOTPATS
MSICPATS – Total patents assigned to this IPC and
assigned to this SIC of Mfre
MTOTPATS – Total patents assigned to this IPC,
excluding patents not assigned to any SIC of Mfre
TOTPATS2 – Total patents assigned to this IPC
4.
The next trick is to link these Canadian SICs to U.S.
SICs. The good
news is that there is a Canada SIC – U.S. SIC concordance published jointly by
Industry
CSICUSIC.txt [3,631 records]
CSIC
– Canadian SIC [e.g., 2442]
USIC
– U.S. SIC [e.g., 3011]
NUMSICS
– Number of US four-digit SICs that are associated
with this CSIC
The
bad news is that many of the Canadian SICs map into
multiple U.S. SICs.
For example, Canadian SIC 2442 is linked in the concordance to U.S. SICs 2253, 2337, and 2339.
This makes it difficult to allocate patents to these U.S. SICs. I have used two ways to account for this in my
work. First, I have simply allocated a CSIC’s patents among U.S. SICs by
dividing them equally among the SICs. So if, for a given IPC, CSIC 2442 has a
UFREQUENCY of 0.3, then U.S. SICs 2253, 2337, and
2339 would each have a UFREQUENCY of 0.1.
Second, for manufacturing SIC only, I have used value of shipments data,
by four-digit US SIC, to weight this allocation. If CSIC 2442 has a UFREQUENCY of 0.3, and if
US SIC 2253 had sales in a given year of $1000, while US SIC 2337 and 2339 each
had sales of $500, then SIC 2253 would have a frequency of 0.15 and SICs 2337 and 2339 would each have UFREQUENCY of
0.075. The posted files only use the
first method of allocation. Please feel free to contact me for data to do the
second method – but because I only have data for manufacturing industries (SICs 2000-3999), this can be tricky.
5.
I then linked the IPCSICUv5.txt (and IPCSICMv5.txt) file to the CSICUSIC.txt
file. The resulting file is
IPCSICFINALv5.txt. The format is as
follows:
IPC – IPC class and subclass
[e.g., A01B]
SIC – US SIC [e.g., 3011]
MFGFRQ – MFREQUENCY/NUMSICS (this is the
frequency with which patents assigned to this
IPC are also assigned to this SIC of Mfre)
USEFRQ
– UFREQUENCY/NUMSICS (this is the
frequency with which patents assigned to this
IPC are also assigned to this SIC of Use)
Additional
notes:
Acknowledgements:
I am grateful to the Alfred P. Sloan Foundation, the
Connaught Foundation, the Social Sciences and
Humanities Research Council (SSHRC), and
Ellis,
E.D. (1981), “Canadian Patent Data Base: The philosophy, construction, and uses
of the Canadian patent data base PATDAT,” World Patent Information 3(1):
13-18.
Luque, A. (2000), “An option-value approach to technology adoption in
McGahan, A.M. and B.S. Silverman (2001), “How does innovative activity
change as industries mature?” International
Journal of Industrial Organization 19(7): 1141-1160.
Mowery,
D.C. and A.A. Ziedonis (2001), “The geographic reach of market and non-market
channels of technology transfer: Comparing citations and licenses of university
patents,” NBER Working Paper #8568, National Bureau of Economic
Research,
Silverman, B.S. (1996), “Technological Assets and the Logic of
Corporate Diversification,” PhD dissertation,
Silverman,
B.S. (1999), “Technological resources and the direction of corporate diversification,”
Management Science 45(8): 1109-1124.
Silverman,
B.S. (2002), Technological Resources and the Logic of Corporate
Diversification,