Identifying Valuable Patents: A Deep Learning Approach
Big data is increasingly available in all areas of manufacturing, which presents value for enabling a competitive data-driven economy. Increased data availability presents an opportunity to introduce the next generation of innovative technologies. Firms invest in innovation and patents to increase, maintain and sustain competitive advantage. Consequently, the valuation of patents is a key determinant in economic growth since patents are an important innovation indicator. Given the surge in patenting throughout the world, the interest in the value of patents has grown significantly. Traditionally, studies on patent value have focused on limited data availability restricted to a specific technology area using methods such as regression, and mostly using numeric and binary categoric data types. We propose the definition for intellectual property intelligence (IPI) as the data science of analysing large amount of IP information, specifically patent data, with artificial intelligence (AI) methodologies to discover relationships and trends in the data for decision making.
With the rise of AI and the ability to analyse larger datasets of patents, we develop an AI deep learning methodology for the valuation of patents. To do that, we build a large USPTO dataset consisting of all granted patents from 1976-2019: (i) we collect, clean, collate and pre-process all the data from the USPTO (and the OECD patent quality indicators database); (ii) we transform the data into numeric, categoric, and text features so that we are able to input them to the deep learning model. More specifically, we transform the text (abstract, claims, summary, title) into feature vectors using our developed Doc2Vec vector space model (VSM), that we assess using the t-distributed stochastic neighbour embedding (t-SNE) visualisation. The dataset is made publicly available for researchers to efficiently and effectively run fairly complex data analysis.
We propose an AI deep learning methodology for the valuation of patents to identify valuable patents. Using our developed dataset, we build AI deep learning models, which are based on deep and wide feed-forward artificial neural networks (ANN), with dropout, L2 penalty and batch normalisation regularisation layers, to forecast the value of patents with 12 ex-post patent value output proxies. These include the grant_lag, generality, quality_index_4, and forward citations, generality_index and renewals in three time horizons (t4, t8, t12). We associate these patent value proxies to their respective patent value dimension (economic, strategic and technological). We forecast patent value using ex-ante patent value input determinants, for a wide range of technological areas (using the IPC classes), and time horizon domains (short term in t4, medium term in t8, and long term in t12).
We evaluate all our models using a variety of strategies (out-of-time test, out-of-sample test, k-Fold and random split cross validation), and transparently report all metrics (accuracy, confusion matrix, F1-score, false negative rate, log loss, mean absolute error, precision, recall). Our models have higher accuracy and macro average F1-scores, with low values for the training and validation losses compared to prior art. With increasing prediction horizons, we observe an increase in the macro average F1-scores for several of the proxies. In addition, we find that the composite index that takes into consideration more than one value dimension, has the combined highest accuracy and macro average F1-score, relative to single value dimension patent proxies. Moreover, we find that firms seem to file widely at the short term time horizon and then focus their technological competencies to established opportunities. Patent owners seem to renew their patents in the fear of losing out. Our study has moved away from relatively small datasets, limited to specific technology field, and allowed for reproducibility in other fields. We can tailor models to different technology area, with different patent value proxies, with different time horizons.
This study proposes an AI methodology, which is based on deep learning, using deep and wide feed forward artificial neural networks, to predict the value of patents, which has academic and industrial implications. We predict the value of patents with a variety of output proxies, including composite index proxies, for different technology areas (IPC classifications) and time horizons. Since we use all USPTO granted patents from 1976-2019 to train our models, we can apply this approach to patents in any technology field. Our approach enables researchers and industry professionals to value patents using a variety of patent value proxies, based on different value dimensions, tailored to specific technology areas. The proposed AI deep learning approach could effectively support expert decision making (technology, innovation and IP managers etc.) in their decision making by providing fast, low cost, data-driven intellectual property intelligence (IPI) from big patent data. Firms with limited resources, i.e. small-medium enterprises (SMEs) can choose representative proxies to forecast patent value estimates, saving resources. Consequently, the proposed approach could efficiently support experts in their patent value judgement, policy making in the government’s investments in technological sectors of the future to support the economy, and patent offices with the AI approaches to analyse efficiently and effectively big patent data. We anticipate this research would be interesting for future researchers to expand the emerging field of IPI research and the skills they will need to perform IPI data-driven research with a variety of data sources and AI deep learning ANN approaches.