### NetREX-CF—method overview

The NetREX-CF model is a data integration framework for reconstructing GRNs by organically utilizing both gene expression *E* and a set of prior networks *P* = {*P*^{1}, . . . *P*^{d}}. The main idea behind the NetREX-CF model is an integration of two complementary optimization strategies: (i) a machine learning component designed based on CF that is able to infer hidden features from the current observed prior networks *P* and utilize these features to recommend an improved GRN and (ii) a sparse NCA-based network remodeling component that can refine the topology of a GRN based on given gene expression *E*. These two computational components operate alternatively. The CF component recommends new edges to the current GRN and the sparse NCA-based network remodeling component screens the recommended edges and keeps the edges that are essential to explain the expression of a given gene. Once the sparse NCA-based network remodeling component confirms some of the recommended edges, the CF component further utilizes those retained recommended edges to make new edge recommendations for the sparse NCA-based network remodeling component to further examine (illustrated in Fig. 1).

Computationally, the system illustrated in Fig. 1 is achieved by simultaneous optimization of the following sets of variables: (i) the activities of TFs (matrix *A*), (ii) a weighted GRN (matrix *S*), and (iii) two feature matrices: the hidden features for target genes (*X* where the *i*th row *x*_{i} represents the hidden feature vector for gene *i*) and the hidden features for TFs (*Y* where the *j*th row *y*_{j} represents the hidden feature vector for TF *j*). The matrix *A* is optimized by the sparse NCA-based network remodeling component while the matrices *X* and *Y* are optimized by the CF component. Notably, the matrix *S* is the connection between the aforementioned two components and should be optimized by considering both components.

Formally, \(E\in {{\mathbb{R}}}^{n\times l}\) is the matrix of expression data of *n* genes in *l* experiments and prior network \({P}^{k}\in {{\mathbb{R}}}^{n\times m},\forall k\) is a weighted adjacency matrix of the bipartite graph that records the prior knowledge of regulations between *m* TFs and *n* genes. Matrix \(A\in {{\mathbb{R}}}^{m\times l}\) is the TF activity for *m* TFs in *l* samples and \(S\in {{\mathbb{R}}}^{n\times m}\) is a weighted GRN. We further define penalty matrix *C* and observation matrix *B* based on the set of prior networks *P*. The matrix *C* is used in CF component. For edges in the prior, the corresponding elements in *C* will assign larger values to make sure those edges will be kept in the final prediction. For edges not in the prior, the corresponding elements in *C* will assign smaller values to encourage new edges as recommendations. The matrix *B* is used to indicate which edges have prior information. Each element in *C* can be computed by \({C}_{ij}=1+a{\sum }_{k}{P}_{ij}^{k}\) (*a* = 60 suggested by ref. ^{18}). If more than one prior network suggests the regulation between the *i*th gene and the *j*th TF, then *C*_{ij} tends to have a larger value. Large *C*_{ij} would enforce giving the regulation between the *i*th gene and the *j*th TF a lower ranking. Each element in *B* is binary and can be computed by *B*_{ij} = 1 if \({\sum }_{k}{P}_{ij}^{k}\,\ne\, 0\) and *B*_{ij} = 0 otherwise. \(X\in {{\mathbb{R}}}^{n\times h}\) contains feature vector *x*_{i} for gene *i* and \(Y\in {{\mathbb{R}}}^{m\times h}\) contains feature vector *y*_{j} for TF *j*. Then, our optimization problem is formalized as following:

$$\begin{array}{rlr}\mathop{\min }\limits_{S,A,X,Y}&{{{{{{{\mathcal{H}}}}}}}}(S,A)+\lambda {{{{{{{\mathcal{F}}}}}}}}(S,X,Y)&\\ s.t.&{\left\Vert {x}_{i}\right\Vert }^{2}\le 1,\,\forall i\\ &{\left\Vert {y}_{j}\right\Vert }^{2}\le 1,\,\forall j.\end{array}$$

(1)

where \({{{{{{{\mathcal{H}}}}}}}}(S,A):= {\left\Vert E-SA\right\Vert }_{F}^{2}+{\lambda }_{A}{\parallel} A{\parallel }_{F}^{2}+{\lambda }_{S}{\parallel} S{\parallel }_{F}^{2}+{\sum }_{ij}{\eta }_{ij}{\parallel} {S}_{ij}{\parallel }_{0}\) is the sparse NCA-based network remodeling component; \({\lambda }_{A}{\parallel} A{\parallel }_{F}^{2}+{\lambda }_{S}{\parallel} S{\parallel }_{F}^{2}\) are standard regularization terms and ∑_{ij}*η*_{ij} ∥*S*_{ij}∥_{0} induces sparsity of a given prior GRN and therefore only essential edges that help to minimize \({{{{{{{\mathcal{H}}}}}}}}(S,A)\) are retained. ∥*S*_{ij}∥_{0} is the *ℓ*_{0} norm that is 1 if *S*_{ij} ≠ 0 and 0 otherwise.

In (1) \({{{{{{{\mathcal{F}}}}}}}}(S,X,Y):= {\sum }_{i,j}{{{{{\varOmega}} }}}_{ij}{({{{{{\varTheta}} }}}_{ij}-{x}_{i}^{T}{y}_{j})}^{2}\) optimizes the hidden features *X* and *Y* of the CF component; *Θ*_{ij} is a binary matrix of edges to be predicted by the hidden features in the given iteration and *Ω*_{ij} encodes penalties that guide the predictions. Both *Θ*_{ij} ≔ ∥*S*_{ij}∥_{0} ⊕ *B*_{ij} and \({{{{{\varOmega}} }}}_{ij}:= {\bar{C}}_{ij}{\parallel} {S}_{ij}{\parallel }_{0}+{C}_{ij}(1-{\parallel} {S}_{ij}{\parallel }_{0})\) are defined based on ∥*S*_{ij}∥_{0} and the penalty matrix *C*_{ij} is built from the prior information . Detailed explanation of *Θ*_{ij} and *Ω*_{ij} are provided in Method Details section. For the initialization step, both *Θ*_{ij} and *Ω*_{ij} are defined based on the prior networks only while in the subsequent steps they also take into account the results of the sparse NCA-based network remodeling component (illustrated in Fig. 1). To solve the problem (1), we first put all continuous terms together and define \(H(S,A):= {\left\Vert E-SA\right\Vert }_{F}^{2}+{\lambda }_{A}{\parallel} A{\parallel }_{F}^{2}+{\lambda }_{S}{\parallel} S{\parallel }_{F}^{2}\) and then put all the non-continuous terms together and define \(F(S,X,Y):= {\sum }_{i,j}{{{{{\varOmega}} }}}_{ij}{({\parallel} {S}_{ij}{\parallel }_{0}\oplus {B}_{ij}-{x}_{i}^{T}{y}_{j})}^{2}+{\sum }_{ij}{\eta }_{ij}{\parallel} {S}_{ij}{\parallel }_{0}\). Then the optimization problem has a general format of an objective function as *Φ*(*S*, *A*, *X*, *Y*) = *H*(*S*, *A*) + *F*(*S*, *X*, *Y*), where *H*(*S*, *A*) is continuous but non-convex and *F*(*S*, *X*, *Y*) is a composite function of *ℓ*_{0} norm of elements of *S* and other variables so it is neither continuous nor convex. More importantly, ∥*S*_{ij}∥_{0} is coupled with *x*_{i} and *y*_{j}, so that ∥*S*_{ij}∥_{0} cannot be separated from *F*(*S*, *X*, *Y*) as a distinct term. To the best of our knowledge, there has been no known method that can optimize such a complex and non-convex function involving the inseparable *ℓ*_{0} norm. To fill this gap, we propose here an algorithm, GPALM that generalizes the so-called PALM method^{22} and solves a class of problems of the format above, under the assumption that *F*(*S*, *X*, *Y*) is lower semi-continuous (see Supplementary Note 1). The GPALM method is fully described in Supplementary Note 2 where we also formally prove its convergence. The source code of NetREX-CF is available at: https://github.com/EJIUB/NetREX_CF.

### Validation and benchmarking NetREX-CF on yeast data

To demonstrate the capability of our GRN reconstruction method, we tested using datasets from multiple species, which include yeast, fruit fly, and human. For yeast, we collect multiple datasets that measure different perspectives of gene regulation. These datasets include TF ChIP^{7,33,34}, TF DNA binding motif^{7,35}, genetic knockdown^{7,36,37}, and yeast gene expression^{7,38,39,40}. TF ChIP, motif, and genetic knockdown datasets, serving as prior knowledge for TF-gene interactions in the yeast GRN. The details of these priors are summarized in Table 1 and the overlap among priors is illustrated in Table 1. We further utilize TF-gene interactions extracted from YEASTRACT database^{41} as a gold standard to validate the performance of GRN reconstruction. These gold standard TF-gene interactions are supported by both DNA binding and expression evidence. The details of the gold standard TF-gene interactions and their overlap with the prior datasets are shown in Table 1. Results generated by NetREX-CF are benchmarked against the results obtained from the published sequential methods, all of which are GRN prediction methods that are able to use prior knowledge. In the sections that follow, we go into detail about the comparisons between two popular approaches that only consider gene expression, GENIE3^{42} and GRNBoost2^{43}, as well as prior-based approaches like NetREX-CF, MerlinP^{7}, NetREX^{2}, LassoStARS^{6}, the original CF^{18}, the summation of all prior knowledge (PriorSum), and a technique that assigns a random confidence score (uniformly distributed between 0 and 1; hereafter, a random method). For a detailed description of parameter selection for competing methods, we refer the reader to Supplementary Note 3.

To ensure an impartial comparison, we use Average Rank scores (ARS)^{18}. For each method and for each gene *i*, we can obtain a list of TFs that are predicted to regulate gene *i* and sort this TFs by the confidence of the prediction (most confident have higher rank). We use \({r}_{ij}^{g}\) to denote the percentile-ranking of TF *j* within the ordered list of all TFs for gene *i*. Thus, \({r}_{ij}^{g}=0 \%\) means that TF *j* is predicted with the highest confidence to regulate gene *i*, preceding all other TFs in the list. On the other hand, \({r}_{ij}^{g}=100 \%\) when TF *j* is predicted to be the least possible TF for gene *i* or there is no prediction between TF *j* and gene *i* yielded by the method. Based on the gold standard TF-gene interaction dataset *I*, we set *I*_{ij} = 1 if TF *j* regulates gene *i* in the gold standard dataset and *I*_{ij} = 0 otherwise. For any gene *i*, we use the average rank of the gold standard edges in the list of TF predicted to regulate gene *i* as the measure quality of the prediction:

$${\overline{rank}}_{i}^{g}=\frac{{\sum }_{j}{r}_{ij}^{g}{I}_{ij}}{{\sum }_{j}{I}_{ij}}$$

(2)

Lower values of \({\overline{rank}}_{i}^{g}\) are preferable, as they indicate gold standard TFs for gene *i* have a lower rank than others. Furthermore, the overall ranking considering all genes can be computed by

$${\overline{rank}}^{g}=\frac{{\sum }_{i}{\overline{rank}}_{i}^{g}}{\#{{{{{{{\rm{genes}}}}}}}}\,{{{{{{{\rm{in}}}}}}}}\,I}$$

(3)

The denominator is the number of genes that have gold standard TFs in dataset *I*.

Similarly, for each TF we can measure the quality of the sorted list of genes predicted to be regulated by it:

$${\overline{rank}}_{j}^{t}=\frac{{\sum }_{i}{r}_{ij}^{t}{I}_{ij}}{{\sum }_{i}{I}_{ij}},\quad {\overline{rank}}^{t}=\frac{{\sum }_{j}{\overline{rank}}_{j}^{t}}{\#{{{{{{{\rm{TFs}}}}}}}}\,{{{{{{{\rm{in}}}}}}}}\,I},$$

(4)

where \({r}_{ij}^{t}\) denotes the percentile-ranking of gene *i* within the ordered list of all genes for TF *j* and \({\overline{rank}}_{j}^{t}\) is the average rankings for the gold standard genes for TF *j*. \({\overline{rank}}^{t}\) is the overall average rankings considering all TFs.

Figure 2a illustrates the comparison between the competing algorithms in terms of average rankings of gold standard TFs for each target gene. The sorted average ranking curve for NetREX-CF is below all other methods, indicating that the average rankings of gold standard TFs predicted by NetREX-CF for each gene are much lower than the rankings predicted by other methods. In the average rankings of gold standard genes among the genes predicted to be regulated by each TF, NetREX-CF still achieves the best average ranking score. Remarkably, PriorSum (the weighted edge summation of three priors, ARS = 0.228) is very competitive with NetREX-CF (Fig. 2b, ARS = 0.224) indicating that NetREX-CF takes the best advantage of the prior data. It is important to note that NetREX-CF outperforms the original CF (ARS = 0.288), which demonstrates that the integration of the CF model and sparse NCA-based model is beneficial.

Next, in order to demonstrate the advantages of NetREX-CF in predicting ranks for missing data (edges that do not appear in the prior knowledge datasets), we identified all edges that are in the gold standard dataset but are not supported by any prior dataset. Indeed, a large portion of the gold standard dataset (4064 out of 10,525 gold standard TF-gene interactions) is not covered by any prior dataset (Fig. 2f). Therefore, we can use these gold standard interactions with missing prior data to compare the ability of the competing methods in recovering rankings under the assumption of missing data. NetREX-CF achieves much lower ARS for those missing data (Fig. 2c, d). In summary, NetREX-CF achieves the lowest, thus the best, overall ARS (Fig. 2g).

### Reconstruction of the GRN for *Drosophila* Schneider 2 (S2) cell line and validation of NetREX-CF on new experimental data

While the datasets from yeast have been widely served as the only available benchmarking dataset for the evaluation of GRN inference methods, evaluating any GRN inference method using the only case would not produce convincing results. Therefore, for a more comprehensive assessment of network inference by NetREX-CF, we have performed gene expression profiling of *Drosophila* S2 cells, in which we selectively silenced each known TF. We chose S2 cells for these reasons. First, the *Drosophila* community extensively uses S2 cells, and there exists a huge amount of public resources. Such a resource includes TF binding profiling (i.e., ChIP-chip/Seq), performed by the Model Organism ENCyclopedia Of DNA Elements consortium (modENCODE^{44}). Second, the cells are suitable for high-throughput RNAi studies^{45}, whose outcome can be used to construct a benchmarking dataset when combined with genome-wide gene expression profiling.

As a prior, we collected publicly available datasets for S2 cells, which include 250 TF ChIP and 600 expression profiles (see Methods). TF DNA binding motifs were obtained from OnTheFly^{46}, FlyFactorSurvey^{47}, FLYREG v2^{48}, iDMMPMM^{49}, and DMMPMM^{50}. Similar to the yeast GRN, we used TF ChIP and TF DNA binding motifs as prior knowledge for TF-gene interactions in the S2 cell GRN. In addition, we also included the gene co-expression network (from 600 expression profiles) to increase the overall number of TFs with prior information (Table 2).

In order to validate the performance of our NetREX-CF and also to provide the finest dataset for GRN inference evaluation, we experimentally generated and sequenced a total of 1920 RNA-Seq libraries following RNAi knockdown of 488 genes (465 after filtering, see the Methods) that encode all expressed TFs in *Drosophila* S2 tissue-culture cells. Since this is a new dataset that is likely to be valuable outside the current study, we start with a robust analysis of the properties of the data. We used long double-strand RNA molecules (dsRNA) to selectively silence target TFs using the bathing method^{45}. For the most of targets (452 out of 488), we used two different long dsRNAs and these were biologically duplicated to give four measurements per gene. Because we were interested in using these data to trace gene network perturbations without confounding effects from long-term RNAi treatment, we chose the 1-day incubation for the main experiment. Overall, we observed that the median RNA expression was 23.4% after the knockdown (see Methods).

These knockdowns disrupted coherent pathways, not only those controlling the cell cycle, but also cellular differentiation and developmental processes (Fig. 3). We observed subsets of genes with specific functional categories display correlated gene expression changes upon the depletion of different TFs (Fig. 3b–e). For example, the genes in Cluster 2 in Fig. 3a have enriched Gene Ontology (GO) terms in “growth” (GO:0040007, adj. *p* value = 1.89e–06, Hypergeometric test corrected with Benjamini and Hochberg method) and “regulation of protein modification process” (GO:0031399, adj. *p* value = 3.86e–08, Fig. 3b). On the other hand, the genes in Cluster 4 play a role in cell proliferation, which should be the most important phenotypic characteristic of dividing tissue-culture cells. Such genes showed enrichment of GO terms in “mitotic cell cycle process” (GO:1903047, adj. *p* value = 3.47e–12), “DNA replication” (GO:0006260, adj. *p* value = 4.21e–12) and “ribosome biogenesis” (GO:0042254, adj. *p* value = 8.76e–15) that are required for cell cycle progression (Fig. 3). Cluster 6 genes also function in cell proliferation with “ribonucleoprotein complex biogenesis” (GO:0022613, adj. *p* value = 3.25e–08) and “sister chromatid segregation” (GO:0000819, adj. *p* value = 1.29e–05), although this cluster includes metabolic genes as well (Fig. 3). Remarkably, RNAi-based differential expression of Cluster 4 and 6 genes demonstrated an anti-correlation to that of Cluster 10 genes (Pearson’s *r* = –0.86), which comprise developmental genes that function in “post-embryonic animal organ development” (GO:0048569, adj. *p* value = 2.30e–44), “metamorphosis” (GO:0007552, adj. *p* value = 4.65e–44), “neuron projection development” (GO:0031175, adj. *p* value = 2.57e–42), or “wing disc development” (GO:0035220, adj. *p* = 4.59e–42, Fig. 3).

We further noticed that knockdowns of TFs with overlapped functions result in positively correlated gene expression changes. For example, we did GO enrichment analysis for TFs in the group “a” in Fig 3a. We found that TFs in group “a” are enriched in “instar larval or pupal morphogenesis” (GO:0048707, adj. *p* = 2.05e–04) and “metamorphosis” (GO:0007552, adj. *p* = 2.93e–04). Intriguingly, those TFs, when depleted, commonly reduced metabolic gene expression that function in the “amino sugar catabolic process” (GO:0046348), which is the function of genes in Cluster 8 (genes in Cluster 8 in Fig. 3a are enriched in GO:0046348 with adj. *p* = 0.00011) or “monocarboxylic acid catabolic process” (GO:0072329), which is the function of genes in Cluster 12 (genes in Cluster 12 in Fig. 3a are enriched in GO:0072329 with adj. *p* = 0.012). Likewise, we found a group of TFs (“b” in Fig. 3a) whose knockdown upregulates growth or protein metabolism-related genes (Cluster 2 in Fig. 3a), but downregulates cell cycle or carbon metabolism genes (Cluster 6 in Fig. 3a). Altogether, these observations indicate that coordinated responses exist in the *Drosophila* S2 gene network against the perturbations from individual TF knockdown. We then utilized this RNAi data to build an RNAi-based network and used this network as the benchmark dataset for performance evaluation of the competing methods. For the downstream analysis, we defined differentially expressed genes based on log 2-fold changes (either >1 or <−1) or adjusted *p* value (<0.05). Details of the RNAi network and its overlap with the other prior networks are illustrated in Table 2.

Using the S2 cell gold standard from the RNAi experiments, we evaluated the performance of our NetREX-CF with MerlinP^{7}, NetREX^{2}, LassoStARS^{6}, the original CF^{18}, PriorSum, GRNBoost2^{43}, GENIE3^{42}, and the method that makes a random choice. We describe the parameter selection for competing methods in Supplementary Note 3.

Figure 4a exhibits the comparison between the competing algorithms in terms of average rankings of TFs for each target gene in the benchmark dataset. The curve of NetREX-CF is below all other methods, indicating, on average, NetREX-CF ranks TFs identified from RNAi knockdown data lower than other competing methods. In addition, Fig. 4b illustrates the comparison between the competing algorithms in terms of average rankings of target genes for each TF in the benchmark dataset. Intriguingly, many of the tested methods failed to perform better than the random method. Similar observations have been made by previous studies^{3,4}. Only NetREX-CF displayed better performance than the random methods, in terms of the overall ARS, far surpassing all the other methods (Fig. 4c). However, when the ranks of TFs were considered, NetREX-CF has better performance for 182 out of 458 TFs; its prediction power started to be weaker than the random method for the rest of the 276 TFs, analogous to other methods. Still, this result is far better than other methods that only predicted approximately the top 100 TFs (GENIE3 and GRNBoost2), or performed worse than the random methods for all the TFs. We believe that this is due to the TFs that regulate a large number of target genes (see Discussion).

For further comparison between the benchmark data and the GRNs inferred by each method, we performed Gene Set Enrichment Analysis (GSEA). We tested whether the target genes for each TF in the benchmark dataset are statistically enriched on the top of the ranked gene list (*p* value < 0.01), generated by each method. We found that NetREX-CF achieves the best performance, compared to the other methods, by significantly recovering behaviors of 94 TFs in the benchmark network (Fig. 4d). It is worth mentioning that this number is larger than the summed number of TFs confirmed by the original NetREX (=38) and CF (=50), which represents the synergy between the two algorithms in NetREX-CF. GENIE3 followed NetREX-CF with 76 TFs recaptured.

To evaluate the biological relevance of the inferred GRNs, we compared NetREX-CF and GENIE3, which are the two best methods based on our validation. We performed GSEA analysis and investigated TFs whose inferred target genes are significantly enriched for two Gene Ontology (GO) terms: mitotic cell cycle (GO:0000278), or nervous system development (GO:0007399). Potential target genes under these functional terms are expected to be activated or repressed, respectively, in *Drosophila* S2 cells, which is a tissue-culture cell line derived from the late embryos^{51}. NetREX-CF predicted a total of 172 TFs for the regulation of mitotic cell cycle genes when the number was 26 for GENIE3 (Fig. 4e). There was a total of 10 TFs inferred by both methods. The overlapped TFs include Myb, which is a fly ortholog of human MYB and MYBL1, as well as Sin3A. These TFs are known to regulate the G2 or G2/M transition of the cell cycle^{52,53}. We also found that Grp, a product of *grp* (*grapes*) gene and the fly ortholog of human *CHEK1*, from the list. The TF plays a role in cell cycle checkpoint in response to DNA damage^{54}. In addition, while the overlap was partial, both methods identified members of the *E2F*/*RB* transcription factors, or *Myb*-interacting proteins, which include *E2f2* and *Rbf2* (GENIE3), *E2f1*, *Rbf*, *Mip120*, and *Mip130* (NetREX-CF). These TFs form a complex and regulate the G1/S transition of the cell cycle^{55,56}. Therefore, both NetREX-CF and GENIE3 could efficiently predict TFs that regulate mitotic cell cycle genes.

Since NetREX-CF (172) could predict more TFs than GENIE3 (26) as the regulator of the same functional group (mitotic cell cycle), we asked whether NetREX-CF has better efficacy than GENIE3. Remarkably, only NetREX-CF identified a set of TFs that are related to Polycomb and Trithorax Group [Polycomb (Pc), Polycomblike (Pcl), Pleiohomeotic-like (Phol), Enhancer of zeste E(z), Additional sex combs (Asx), and Scm-related gene containing four MBT domains (Sfmbt), Trithorax-related (Trr)]. While these TFs are well known for their regulation of development and differentiation^{57}, there also are involved in cell cycle control^{58,59,60,61,62}. Moreover, only NetREX-CF, but not GENIE3, predicted members of the mediator complex (*CycC*, *MED1*, *MED23*, *MED24*, and *MED26*) as regulators of the mitotic cell cycle genes. Multiple studies have reported the role of these components in cell cycle control or proliferation^{62,63,64,65}. We obtained a consistent result for the TFs that control the nervous system development (Fig. 4f). Both NetREX-CF and GENIE3 identified key regulators, such as *Kay* (Kayak, fly ortholog of human FOS) and *Jra* (Jun-related, fly ortholog of human *JUN* and *JUND*), or *Foxo* (Forkhead box, sub-group O), which are well studied for their roles in nervous system development in fruit flies and/or human^{66,67,68,69,70}. However, only NetREX-CF could capture, for example, *Sox14*, which is one of the master regulators of neurogenesis in *Drosophila*^{71}. Collectively, we observed that NetREX-CF outperforms GENIE3 in inferring biologically relevant GRNs for those functional categories.

Importantly, NetREX-CF surpassed the competing methods even when we did not use the TF binding prior (e.g., ChIP-Seq). ChIP experiments largely rely on the availability of suitable antibodies that can pull down TFs under the cross-linked condition. Therefore, we tested how the absence of ChIP data would change the performance of NetREX-CF using our *Drosophila* S2 cell benchmark dataset (Fig. 4g–i). The performance of GRNBoost2 and GENIE3 is not altered since they do not use any prior information (ARS = 0.497 for the TF predictions for both). In general, the performance of the prior-based methods (e.g., MerlinP, LassoStARS, the original NetREX, and PriorSum) became worse (compare Figs. 4a, g) when no TF binding prior was used (compare Figs. 4a, g). This was the case for NetREX as well (ARS 0.453 vs. 0.478 with, and without, the ChIP results, respectively), underscoring that the use of the TF binding prior greatly improves the GRN inference. However, NetREX-CF still outperforms all the competing methods even without the information, although the difference with the second-best, GENIE3, becomes smaller (ARS 0.478 vs. 0.497 for the gene prediction). This finding shows that NetREX-CF still functions and outperforms alternative approaches in the absence of ChIP-Seq or ChIP-chip data. However, we advise employing the TF binding prior wherever possible to get the best prediction with NetREX-CF.

### Reconstruction of cell-type-specific GRNs from human single-cell RNA-Seq studies

ScRNA-Seq technology is rapidly adopted by biologists for uncovering cell-type-specific transcription programs. To demonstrate that NetREX-CF can generate cell-type-specific GRNs using scRNA-Seq results, we benchmarked all competing methods using previously published human studies. To build GRN for human hepatocyte-like cells (hHep GRN), we collected gene expression profiles from a scRNA-seq analysis of induced pluripotent stem cells (iPSCs) in two-dimensional culture differentiating to hepatocyte-like cells^{31}. We utilized the following datasets as priors: (i) functional prior GRN from STRING^{72,73}, and (ii) non-specific prior GRN from DoROthEA, RegNetwork, and TRRUST^{73,74,75,76}. To evaluate the inferred hHEP GRN, we extracted all the cell-type-specific GRNs from ENCODE, ChIP-Atlas, and ESCAPE databases for ChIP-Seq data from the same or similar cell type^{73,77,78,79}. The summary of the single-cell RNA-Seq, as well as the prior and benchmark GRNs, are elaborated in Tables 3 and 4, respectively.

We compared our NetREX-CF with the prior-based methods (NetREX, MerlinP, and LassoStARS), which can incorporate the prior information to build the hHEP GRN as well as the expression-based methods (GRNBoost2 and GENIE3), which rely on only the scRNA-Seq data to build the hHEP GRN. These two methods were the best-performing methods for predicting directed networks using scRNA-Seq data in a recent evaluation, and therefore, provide a good baseline for the comparison^{4}. NetREX-CF achieves the lowest overall ARS by a large margin (Fig. 5a, ARS = 0.349, compared to 0.5 from the random method), indicating that the rankings of the TFs predicted by NetREX-CF are much lower than the competing methods from the benchmark. In the case of their target gene prediction, we found that all methods have comparable degrees of performance [Fig. 5b, c, ARS 0.486–0.574 with NetREX-CF performing the best]. We think that this is because the TFs in the benchmark dataset control thousands of genes, resulting in an ARS of about 0.5 (see the discussion section).

As the second example, we built the human embryonic stem cell GRN (hESC GRN) by using scRNA-Seq results of the differentiation protocol to produce definitive endoderm cells from human embryonic stem cells^{32}. Analogous to what we performed for building hHEP GRN, we utilized a functional prior network based on the STRING DB, non-specific prior networks from other databases, and cell-type-specific GRN from ChIP-Seq datasets (Tables 3 and 4 for the summary information). Consistent with our observation in the hHEP GRN, NetREX-CF demonstrated superior performance to the other methods in recovering the rank of TFs in the benchmark dataset (Fig. 5d, ARS = 0.366 compared to 0.500 from the second-best method, GENIE3). However, in agreement with the hHEP GRN results, recovering the rank of target genes in the benchmark dataset displayed less distinguishable performances for all the methods (Fig. 5e, f, ARS 0.489 (best, NetREX-CF) to 0.575 (worst, PriorSum)).