Documentation for International Patent Classification – U.S. SIC concordance

 

Download the files:  IPCSICFILES.zip

 

 

  • IMPORTANT NOTE! Prior to September 20, 2004, the files IPCSICUv5.txt and IPCSICMv5.txt had their titles reversed, and the MFGFRQ and USEFRQ fields in IPCSICFINALv5.txt were reversed as well!  So if you downloaded the data before that data, you will want to download the corrected versions.  The relevant research assistants have been severely beaten about the face and neck.  Thanks to Will Kerr, PhD candidate at MIT, for discovering this error.

 

 

As part of my research, I have constructed a concordance that links the International Patent Classification  (IPC) system to the U.S. Standard Industrial Classification (SIC) system at the four-digit SIC level.  Although U.S. patents are assigned to U.S. Patent Classes, each patent in the U.S. is also assigned an IPC. Thus it is feasible to use this concordance to link U.S. patents to those SICs where the patents are likely to provide value.  The concordance has been used by various scholars to assess the specific industries in which firms have technological strength (Silverman 1999); patenting activity through the industry life cycle (McGahan & Silverman 2001); industry-specific technological uncertainty (Luque 2000); and industry-specific effects in university-industry technology transfer (Mowery & Ziedonis 2001).

 

This document is intended to enable scholars to use the concordance to further their research.  It describes the text file that contains the concordance, IPCSICFINALv5.txt.  Specifically, it defines the fields in this file, and provides a “heads-up” for some of the peccadilloes associated with these data. 

 

Although this document explains briefly how I constructed the concordance file, it is not intended to provide a full-blown description of the theory and methodology underlying the concordance construction.  For more detail, please refer to chapter 4 of Silverman (2002).

 

I have also uploaded the intermediate files used in the creation of IPCSICFINALv5.txt, in case interested scholars wish to check or change anything.  PLEASE LET ME KNOW IF YOU MAKE CHANGES TO YOUR VERSION OF THESE FILES, AS I WOULD LIKE TO KEEP TRACK OF ANY FORKS IN THE DATA! Thanks.

 

Here are the steps I performed to create IPCSICFINALv5.txt:

 

1. Identify all patents in the Canadian Patent Office’s PATDAT database whose International Patent Classes (IPCs) were assigned according to version five of the IPC system.  These are virtually all patents granted between 1990 and 1993.  There are about 148,000 patents in this population.  

 

 

2. Each patent in the population has a primary SIC of Use and a primary SIC of Manufacture that is assigned by Canadian Patent Examiners who have been specially trained by people in Statistics Canada.  For information on the assignment of SIC codes to patents in the PATDAT database, see Ellis (1981). [Note #1: A small number of patents do not have an SIC of Use of SIC of Manufacture assigned, and are excluded from this concordance. Note #2: The Canadian Patent Office ceased assigning SICs to patents after 1993.]

 

 

3. For each IPC c, I constructed a frequency distribution reflecting the proportion of patents assigned to IPC c that were assigned to industry j, j = Canadian four-digit SIC class.  I constructed separate frequency distributions for SICs of Use and SICs of Manufacture.  These are recorded in the files IPCSICUv5.txt and IPCSICMv5.txt.  The formats are:

 

IPCSICUv5.txt [15,494 records]

IPC                  – IPC class and subclass [e.g., A01B]

CSIC                – Canadian SIC [e.g., 2319]

UFREQUENCY– USICPATS/UTOTPATS

USICPATS      – Total patents assigned to this IPC and assigned to this SIC of Use

UTOTPATS     – Total patents assigned to this IPC, excluding patents not assigned to any SIC of Use

TOTPATS2      – Total patents assigned to this IPC

 

IPCSICMv5.txt [9,770 records]

IPC                  – IPC class and subclass [e.g., A01B]

CSIC                – Canadian SIC [e.g., 2319]

MFREQUENCY– MSICPATS/MTOTPATS

MSICPATS      – Total patents assigned to this IPC and assigned to this SIC of Mfre

MTOTPATS    – Total patents assigned to this IPC, excluding patents not assigned to any SIC of Mfre

TOTPATS2      – Total patents assigned to this IPC

 

 

4. The next trick is to link these Canadian SICs to U.S. SICs.  The good news is that there is a Canada SIC – U.S. SIC concordance published jointly by Industry Canada and the US Department of Commerce (U.S. DoC 199#).  I transcribed this into a data file called CSICUSIC.txt.  The format is:

 

CSICUSIC.txt   [3,631 records]

CSIC – Canadian SIC [e.g., 2442]

USIC – U.S. SIC [e.g., 3011]

NUMSICS – Number of US four-digit SICs that are associated with this CSIC

 

The bad news is that many of the Canadian SICs map into multiple U.S. SICs.  For example, Canadian SIC 2442 is linked in the concordance to U.S. SICs 2253, 2337, and 2339.  This makes it difficult to allocate patents to these U.S. SICs. I have used two ways to account for this in my work.  First, I have simply allocated a CSIC’s patents among U.S. SICs by dividing them equally among the SICs.  So if, for a given IPC, CSIC 2442 has a UFREQUENCY of 0.3, then U.S. SICs 2253, 2337, and 2339 would each have a UFREQUENCY of 0.1.  Second, for manufacturing SIC only, I have used value of shipments data, by four-digit US SIC, to weight this allocation.  If CSIC 2442 has a UFREQUENCY of 0.3, and if US SIC 2253 had sales in a given year of $1000, while US SIC 2337 and 2339 each had sales of $500, then SIC 2253 would have a frequency of 0.15 and SICs 2337 and 2339 would each have UFREQUENCY of 0.075.  The posted files only use the first method of allocation. Please feel free to contact me for data to do the second method – but because I only have data for manufacturing industries (SICs 2000-3999), this can be tricky.

 

 

5. I then linked the IPCSICUv5.txt (and IPCSICMv5.txt) file to the CSICUSIC.txt file.  The resulting file is IPCSICFINALv5.txt.  The format is as follows:

 

IPCSICFINALv5.txt     [76,944 records]

IPC                  – IPC class and subclass [e.g., A01B]

SIC                  – US SIC [e.g., 3011]

MFGFRQ         – MFREQUENCY/NUMSICS (this is the frequency with which patents assigned to this

            IPC are also assigned to this SIC of Mfre)

USEFRQ          – UFREQUENCY/NUMSICS (this is the frequency with which patents assigned to this

            IPC are also assigned to this SIC of Use)

 

 

Additional notes:

 

  • Some Canadian patents are assigned to Canadian SICs at only the 3-digit level.  I have allocated these among all four-digit SICs that fall under the 3-digit Canadian SIC, using the method described in step #4 above.

 

  • I hope to eventually post identical files for the third and fourth versions of the IPC, which together cover most of the 1980s.  This will be useful to scholars who are studying industrial activity of the 1980s, since these IPC-SIC concordances should more accurately reflect patent-industry linkages during the 1980s than will the concordance from the early 1990s.  If you are working on research related to the 1980s and would like the earlier versions of the concordance, just email me to nudge me and I will do my best to get them posted sooner rather than later.

 

 

 

Acknowledgements:

 

I am grateful to the Alfred P. Sloan Foundation, the Connaught Foundation, the Social Sciences and Humanities Research Council (SSHRC), and Harvard Business School’s Division of Research for research support associated with the projects that generated this concordance.

 

 

 

References

 

Ellis, E.D. (1981), “Canadian Patent Data Base: The philosophy, construction, and uses of the Canadian patent data base PATDAT,” World Patent Information 3(1): 13-18.

 

Luque, A. (2000), “An option-value approach to technology adoption in U.S. manufacturing: Evidence from plant-level data,” Center for Economic Studies Working Paper CES-WP-00-12, July.

 

McGahan, A.M. and B.S. Silverman (2001), “How does innovative activity change as industries mature?” International Journal of Industrial Organization 19(7): 1141-1160.

 

Mowery, D.C. and A.A. Ziedonis (2001), “The geographic reach of market and non-market channels of technology transfer: Comparing citations and licenses of university patents,” NBER Working Paper #8568, National Bureau of Economic Research, Cambridge, MA.

 

Silverman, B.S. (1996), “Technological Assets and the Logic of Corporate Diversification,” PhD dissertation, University of California at Berkeley, Haas School of Business.

 

Silverman, B.S. (1999), “Technological resources and the direction of corporate diversification,” Management Science 45(8): 1109-1124.

 

Silverman, B.S. (2002), Technological Resources and the Logic of Corporate Diversification, London, UK: Routledge.