Soihub: Center for Science of Information

Knowledge Thrust
The Center targets two broad fundamental areas of knowledge management, motivated by three transformative domains.

Information Science for Collaborative Computing and artificial intelligence-based Inference: In many applications, high-value data is distributed among parties that share some common goals and have some individual goals. There are important questions involving what data to share and who to share it with to accomplish desired tasks. These issues are particularly important in the face of limited resources such as time, power, and bandwidth, and other considerations such as privacy and security. We will explore fundamental problems in distributed AI inference and collaborative computing, and particularly the role of information in these tasks.

Often parties may be reluctant to share information, even though all would gain from collaboratively computing using everyone's data. The reluctance to share can be quite rational if the drawbacks of revealing one's private, proprietary information, and the loss of control against its further dissemination and misuse, can outweigh the benefits gained from sharing private information. Quantification of the information gained and the private information leaked, would enable rational cost-benefit analysis by potential collaboration participants. In the absence of this, risk aversion dominates, and many potential "win-win" collaborations may not take place. One major challenge in this endeavor is the impact of time - the time-value of information versus the time to compute it (e.g., a data disclosure may be harmless if computing the confidential information from the disclosed data takes long enough). A second major challenge is mitigation - perturbing the disclosed data to protect private and confidential information, without damaging its usefulness for the purpose of collaborative computing and inference. A third challenge is quantifying the mitigation afforded by secure multiparty computation protocols, which makes possible "computing with data without knowing it" yet must inherently leak the information that can be inferred from knowing one's own inputs and the computed outputs.

In addition to computing and AI inference, another fundamental challenge we will explore are methods to summarize complex or high dimensional datasets, for example nonlinear dimensionality reduction and various techniques for making complex datasets easy to interpret (data visualization). This is particularly important in many of the applications that will be investigated (e.g., biology, economics, social networks, environmental modeling).

Semantic, Goal-Oriented and Communication: One of the goals of the Center is to propose a modern theory which integrates computing and communication right from the start. Such a theory would attempt to formalize the "problems" that devices attempt to solve by communicating, i.e., the goals of communication. By then focusing on these goals, we hope that efficiency and reliability measures can be proposed that allow various solutions to be analyzed rigorously and compared quantitatively.

Energy and critical infrastructure systems: One of the goals of the Center is to streamline the adoption of AI-based inference to support the various engineering activities of critical systems. AI is challenged when applied to such systems given their complexity and lack of a rigorous basis to quantify the value of information to AI algorithms. Given the critical nature of nuclear power, the center has a key thrust focusing on the highly anticipated role of AI in supporting the 21^st century goals of nuclear technology, focused on decarbonization and economic competitiveness. The center is targeting R&D areas of high potential and immediate impact on nuclear power’s economization, safety, security, and safeguarding, covering a wide range fof activities ranging from basic nuclear physics modeling to design optimization un uncertainty inference for supporting model validation and regulation standards for first-of-a-kind systems to the safeguarding of advanced reactors and fuel cycles, all seeking to adapt to the stringent needs of the advanced nuclear systems.

Economics and Information Theory: Much of modern dynamic theory formulates models by examing how continuously optimizing agents will interact in markets. This has been important in allowing consistent treatment of economic behavior, but the models postulate continuous optimization, implying very rapid responses to policy changes and to market signals, whereas actual behavior is more sluggish. Approaches to address this (e.g., by postulating "adjustment costs") have an ad hoc flavor and are not grounded in direct microeconomic observations.

The existing "rational expectations" theories with continuous optimization imply infinite mutual information, in Shannon's sense, between the stochastic process for market signals and the stochastic process of a person's action. At least qualitatively, recognizing that this rate of information flow must be finite explains a broad array of observed facts about economic behavior that has in the past been explained with ad hoc postulates of inertia or adjustment costs. Our work attempts to integrate a formal information-theoretic approach into dynamic economic theory. This seems to be a promising avenue for both explaining observations and improving the formulation of economic policy.

Learning and Inference in Networks : In order to model decision-making and behavior in networks, it is important to be able to efficiently estimate joint distributions over possible network structures and accurately assess the significance of discovered patterns. For example, one network mining task is to estimate the joint distribution of node attributes (e.g., the political views of users in Facebook) conditioned on the network structure, modeling dependencies among neighboring nodes (e.g., similar political views among friends). The resulting distribution is useful to jointly predict the unknown features of nodes in a network, exploiting dependencies among nodes to improve predictions. While there are some recently developed methods for this problem, little is known about the theoretical foundations of these methods or of the underlying estimation problem. Another fundamental problem is to estimate probability distributions over the graph structures themselves. Accurate estimation can improve understanding of the underlying network generation process and is a necessary precursor for anomaly detection in network activity graphs (e.g., intrusion and fraud detection). Current methods result in estimated models that fail to capture the natural variability of real world social network domains. These and other foundational problems are pursued.

Environmental Modeling and Statistical Emulation: Many environmental and climatological processes are studied with the aid of deterministic computer models. The computer model encapsulates knowledge about the evolution of the process over space and time , typically through the numerical solution of a system of differential equations. Although such models are typically deterministic, many quantities are not known with certainty, including the value of the output at new input values, and the relationship of the model to true system quantities.

To know more, please contact

E-mail : soi-stc@purdue.edu
Phone : (765) 494-2908

Knowledge Thrust