Dynamics of technology emergence in innovation networks
We use an original method (see “Data and methods” section) to move from six types of raw innovation data to produce citation networks which encode the multiplicity of innovation phases. We then perform statistical analysis on these networks to interrogate the accumulation, speed, and division labour in innovation.
Descriptive statistics
Data on biomedical innovations is an excellent source, not only because the concept of translation is most established in medical research but also because new therapeutics are required by law to be reported and registered. In particular, we focus on eight vaccine approvals where there is excellent recent data available. The eight citation networks we study range in size from 15,282 nodes and 81,716 edges up to 153,446 nodes and 953,002 edges, see table B.2 in appendix.
Critical innovation path narrates causality in innovation
Graphically, we follow Eq. (1) and plot depth as a function of height. We plot the data from the mRNA vaccine graph in Fig. 2 to illustrate: the bottom left node represents regulatory authorization, and the top right nodes represent the earliest nodes in the network. The diagonal represents nodes which lie on at least one longest path of the DAG, our critical innovation paths, while numerous sub-critical nodes populate the region above the diagonal. From the hue of the diagram, we also observe a cluster of non-critical innovations at the region with low height and low depth. We observe the same pattern in all eight DAGs, as shown in section D of the appendix. We set forth to inspect: (i) nodes that are critical, (ii) the order of critical nodes from oldest to newest, and (iii) nodes that are of lowest criticalities to test the theoretical equivalence between critical schedule path and longest network path.
Looking at nodes whose criticality is strictly zero (i.e. most critical), in each DAG in Fig. 3, we see a mix of nodes representing publication, clinical trials, and regulatory authorisation. If we relax the criticality threshold to consider nodes whose criticality is below 19.5% of the maximum height, we begin to see many more publications, a few more clinical trials, and a few patents in this relaxed critical path region. The order of the critical path, moving from high to low height nodes, always proceeds from publications, intertwined with a much smaller number of patents if in the version with the 19.5% threshold, followed by phase 1, 2, and 3 clinical trials, before ending with the regulatory authorisation. This sequence generally proceeds from basic and applied research (publications), product and process development (patents), demonstration (clinical trials), and commercial application (regulatory authorisations).
A closer look at the critical path nodes unveils a logical sequence of technical progression. For instance, the Moderna mRNA vaccine DAG has its longest paths formed by early attempts to apply mRNA as an influenza vaccine platform27,28, using liposomal delivery system to enhance the expression kinetics of mRNA vaccine29,30,31, methylation to enhance in vivo antigen expression32, the phases 1-3 clinical trials of mRNA COVID vaccines (NCT04283461, NCT04796896, NCT04847050, NCT04470427), and finally the FDA emergency use authorisation letters33 events that the scientific literature is well aware of34,35. In addition, the longest path of the same DAG also identified critical discoveries that may have been overlooked: mRNA post-transcriptional modification mechanisms36,37,38,39,40,41 and early basic research about the potential to modify RNA to evade detection by toll-like receptors42,43,44.
We are also interested in the identity of non-critical nodes. Having low criticality in a DAG does not mean the innovation is unimportant; it means events are not rate-limiting and can be perhaps parallelised. Empirically, in the BioNTech/Pfizer COVID vaccine DAG, for example, nearly all reviewed nodes with low criticality are either clinical research about prevalence and risk factors for diseases non-specific to COVID. Low criticality events are likely non-critical to the approval of the vaccine by regulatory agency and, in this example, used to facilitate the design of clinical protocols.
Since we can measure the criticality of every innovation event within our network, we can also compute the propensity of different research agencies to fund critical innovations (table E.1 in appendix).
Calendar time against height reveals innovation speed
The order inherent in a DAG gives a natural “clock” for the innovation process captured by our citation network. It is interesting to see how this network order compares against calendar time. To see this, we plotted the number of days between a document’s date and the final regulatory authorisation against the height of that document in Fig. 3. This shows that calendar date is strongly correlated with network order, but the relationship is non-linear. Broadly speaking, the smallest calendar day at every height are nodes on the longest path (i.e. they are nodes with 0 criticality), but why is the rate of change, shown by the red line in Fig. 3, non-constant?
Time and network order in a citation network proceed in the same direction. This is because new documents can only cite older documents and, similarly, innovation is cumulative45. However, their unit of progression differ: time proceeds in evenly spaced seconds or days, whereas network order proceeds in citation steps that are non-equidistant. As an analogy, an innovation “clock” is one containing ticks that are spaced out differently. The latter means that the time gap and frequency of citations can both increase and decrease over the course of an innovation lifecycle.
Visually, Fig. 3a suggests the publication date is rising at a constant rate for most critical nodes, but the rate increases for documents with a normalised height close to 1.0. We have tried to estimate the rate of change of publication date against height in Fig. 3b by smoothing the data for those nodes on a longest path. On small scales, the change in height with calendar time fluctuates as seen in the red line in Fig. 3b. On a larger scale, the trend overall shows that height and time are reasonably correlated. This could show that network order provides an alternative measure of innovation progress compared to calendar time (section F.7 in appendix).
The first finding is that the order of node types along the critical path in Fig. 3b shows a clear progression of publications (basic and applied research) to patents (product and process development) to clinical trials (demonstration). However, if we also consider non-zero criticality nodes, we start to see more overlaps between node types. Phenomenologically, this shows the “stochastic” search for innovation often involves feedbacks across different document types – heeding to Kline and Rosenberg’s “chain-linked” model of innovation1. On the other hand, when it comes to critical innovation bottlenecks, a clear progression from research to development to demonstration is observed, agreeing with the “linear” model of innovation16,17.
The second finding is the negative rates of change in Fig. 3b. The existence of forward (future) and backward (past) citations is due to the patenting process often spanning several years. We found that due to interactions between patent applicants and examiners during patent prosecution, the patent document may be updated with new references. We use the initial patent submission date as our patent publication date. A year or two into the patent process, a recent paper can be added to the application, one that was published after the patent was submitted. As a result, a patent may cite forward in time as well as the logically acceptable backwards in time. We could use the patent award date as our patent publication date, which would solve the problem with the example just given. However, we now run into problems with documents that cite a patent that is not yet approved, which is a critical part of the innovation process. This again illustrates why our using the height of a node in our citation network can be a more consistent record of the logical order in the innovation process.
The third and most interesting finding is that innovation accelerates as a technology matures.If each edge is assumed to be the least publishable unit of knowledge, the decreasing time taken to move up one unit of height means the “innovation clock” is speeding up. The rates of change for critical patents and clinical trials fluctuate between 300 and -300 days, with the negative values indicating the problems of using a single publication date for patents, as these are revised over the several years it takes for a patent to be approved. On the other hand, it takes 50-1300 days for height to increase by one in the early critical journal publications, whereas more recent critical publications, those closer to the regulatory approval, have one year for a height increase of one, indicating an increasing rate of innovation towards the later stage. The observed fluctuations at the beginning and end of the networks might stem from data limitations. These limitations include significant shifts in citation behaviours (changes in node type), subject domains becoming well-defined and focused, and sparser data in the past. The snowball sampling (appendix B.3) method used for network generation also causes data truncation as we have set a predefined step limit on the network’s size.
An innovation dynamic we observe is that the first derivative of calendar days with respect to network height increases with network height. The conceptual framework of Dosi47 and case studies of Auerswald and Branscomb48 provide a plausible explanation for this phenomenon: At the nascent stage of technology, the organisation of innovation (nodes and edges) was largely random. Public research provides the necessary technology “push” to de-risk innovations required for a solid business case by reducing asymmetries of scientific information and motivation. As a technology matures, the accumulation of technological knowledge and refinement of the direction of search – through clarifications of research purpose and consumer demands, and increased R&D funding – likely facilitate more frequent and targeted innovations towards the vaccine. This is because the innovation actors involved in the vaccine have become more adept at identifying and exploiting emerging technology’s opportunities. This handover from public to private sector is confirmed by Fig. 4 and discussed in the next section. Across the eight vaccines (Fig. D.3 in the Appendix), making critical progress at later innovation phases always takes less time.
Division of innovation labour is quantifiable via network height
Using the findings above, we demonstrate another real-world utility of innovation order. We portray the frequency of innovator funding as a function of network height to discern the innovation phases entities are supporting (section B.5 in appendix for methods). Figure 4 shows the top five funders by number of nodes funded, three mission-oriented innovation agencies (entities that specifically fund frontier innovations to attain specific goals49), and the top five pharmaceuticals by number of nodes funded. We observe that the largest funders tend to occupy lower height, or early-stage; pharmaceuticals fund a mix of mid- and late-stage documents; whereas mission-oriented innovation agencies are generally more evenly spread across different innovation stages. Looking at calendar time, the median days of mission-oriented agencies and pharmaceuticals (2-19 years) are much closer to regulatory approval than large funders are (10-27 years). This may indicate the strategies and division of labour among innovation entities: Larger funders fund basic and risk-averse research, mission-oriented agencies initiate high-risk research and translate discoveries to other funders, and pharmaceuticals playing their obvious commercialization role at the later stages. Although we do not know whether this division of labour is deliberate or a result of their funding agenda, compared to innovation input data such as R &D spending by agency50, our results reflect inter-agency coordination in innovation.
Validation
We validate the critical path by checking for documents that also appear in literature reviews published by subject-matter experts. Figure 5a shows the height versus depth diagrams (as described in Fig. 2) for the Moderna mRNA vaccine but with additional annotations showing 352 documents found in three literature reviews on mRNA vaccines35,51,51. We found that the critical path (the hypotenuses) are heavily populated by documents referenced in the literature reviews. Figure 5b shows that documents found both in the Moderna Spikevax vaccine network and literature review have lower median criticalities of 0.0710 [0.169,0.574] (where we give 25% and 75% in brackets) compared to documents found only in the former where the median is 0.0333 [0.169,0.574]. Kolmogorov–Smirnov tests indicate that criticalities of documents found in literature reviews is significantly different to that of documents not found in literature reviews (the p-values are always much less than 0.0001), validating the use of the critical path method to identify important innovation events.
Read More