Network Programming
Research in wireless sensor networks is moving to prime time, and undergrad Jay Ayres plays a leading role.
By Joe Wilensky
Reprinted from Cornell Engineering Magazine, Summer 2003
Take a young, dynamic computer science professor and add an ambitious and intellectually curious undergrad. Mix well against the backdrop of Cornell's reputation as a leading research university. The result: cutting-edge research in wireless sensor networks and an investment in the next generation of researchers in the field of computer science.
Jay Ayres '04 is the student; Johannes Gehrke is the prof. But what are wireless sensor networks? It's something of a futuristic idea reaching the prototype stage only recently.
The Defense Advanced Research Projects Agency (DARPA) began funding work on this concept at the University of California, Berkeley, in 1998, with a vision of "smart dust" for military applications. In their scenario, thousands of tiny wireless sensors the size of dust motes could be scattered over a battlefield without arousing the enemy or risking human life. These "dust particles" would organize themselves as a network, gather data on such things as troop location or presence of chemical warfare agents, then relay significant information back to headquarters. Like the Internet (another DARPA brainchild), these wireless sensor networks may have even more utility in non-military settings. Once you wrap your brain around the concept "that the physical world can become a computing platform"any number of consumer and research applications come to mind. For example, "intelligent" buildings outfitted with sensors could measure and adjust temperature, noise, and light and even respond to queries to report whether Johannes is in his office or if there's an empty seat available in a meeting room. Used in the environment, such networks could detect particular animal species and record patterns of movement or migration. Deployed in a forest, sensor networks could monitor fire emergencies. Networks could be used to control inventory, monitor product quality, and provide an interface for people who are disabled.
Clearly there is vast potential across a wide range of fields for using networks made up of many small sensor nodes, but there are many problems to be worked out, including power consumption, computing power, and communication quality. Working with Gehrke's research group, Ayres is taking on the challenge to make the sensor network itself do much of the processing of queries, thereby making it far more flexible as an adaptable system and more powerful for the user.
A Cornell Presidential Research Scholar, Ayres had little experience with research before coming to Cornell, other than a short paper on computer graphics he wrote in high school for a class assignment. "It wasn't anything like what I'm doing now," he says. "I didn't really know what college-level research would entail. I knew Cornell was a large research university, and that the research opportunities would definitely be something I would want to take advantage of while I'm here."
Up to 75 Cornell Presidential Research Scholars (CPRS) are admitted to Cornell each year in all seven undergraduate colleges. The four-year program offers each student an opportunity to work with a faculty mentor on an individualized program of faculty-directed research.
All CPRS students attend a colloquium during their freshman year to get acquainted with some of the research opportunities on campus. Ayres became involved with the Cornell Database Group, a group of faculty, researchers, and students in the Department of Computer Science who work on new database and data mining technologies.
Gehrke, an assistant professor in computer science and a member of the database group, received his Ph.D. from the University of Wisconsin at Madison and came to Cornell in 1999, just a year ahead of Ayres. He welcomed the collaboration with an eager freshman looking for exciting research work.
"It's like an investment," Gehrke says of working with undergraduates. "The first year, there's a very steep learning curve. It's during the second year that students start actually becoming productive." Ayres admits he didn't come to Cornell with much research experience, which he says is true for most freshmen. During his first winter break, he started reading research papers to get acquainted with the area of data mining research, in which he worked initially. "When I first started reading these research papers, it was very daunting just trying to understand the language," he says. "It was extremely technical. That was the biggest hurdle to overcome."
Ayres began his research with Gehrke by working on the Himalaya data mining project. Data mining uses algorithms to extract useful information from very large databases. Imagine a database, kept by a grocery chain, of every customer receipt issued in every store throughout the history of the chain's existence. (And such databases are not only imagination today.) Through data mining, customer buying patterns can be analyzed by finding sequences in the database. As a simple example, Gehrke offers that data mining could be used to figure out what percentage of customers who bought milk on one visit bought bread on the next visit.
His first year on the project, Ayres helped develop an algorithm that used a novel technique to quickly mine huge databases and return results. "At the time and probably even up until now, it is the fastest published algorithm for this problem that's out there," Gehrke says.
Ayres explains that the algorithm, which was implemented in C++, uses a bitmap method to store the transactional database as the algorithm is performing operations on it. That allows the algorithm to use simple Boolean and/or operations to find larger and larger sequences throughout the database.
"One of the ways the algorithm can be described is that it uses clever data structures," Gehrke says, "which are optimized for our current processors and for the specific problem." Gehrke describes the concept of a "market basket" what a customer buys on one visit, whether it's to a physical store or a website. What the algorithm can do is find temporal or sequential patterns in market baskets over time that suggest what customers like and how they make purchases, he says. An example of such a pattern might reveal a number of customers who first purchased the book The Lord of the Rings and later bought the second and third books in the trilogy. The algorithm can cull these temporal patterns from very large amounts of data.
Ayres worked on the data mining project with Gehrke for about a year and a half and published a paper, "Sequential Pattern Mining (SPAM)" Using a Bitmap Representation, which they presented at the Association for Computing Machinery's International Conference on Knowledge Discovery and Data Mining in 2002.
Since then, Ayres has been working with Gehrke on the Cougar project. "The Cougar system investigates a novel paradigm of interacting with wireless sensor networks," Gehrke explains. By abstracting the sensor network as a database, users can program the sensor network in a declarative language-- the sensors are told what to do without specifying how to do it. Everything the sensors do--from finding an average temperature over a geographic area or tracking a moving object--relies on the Cougar system to optimize user queries and to implement them across the network.
By setting the system up this way, the network can be constantly customized for a variety of applications. Sensors can be added to the network at any point in time, and they can be queried in an ad-hoc way without requiring the user to write a complicated program. A traditional sensor network, in contrast, relies on individual sensors programmed for specific applications with a predefined set of actions they can take and with a predefined set of data to be extracted. "The novelty lies in the in-network processing," Gehrke says. "The network is not just being used as a big data-gathering system, with data sent to a gateway node. For example, there might be aggregation of the data taking place within the network, or other types of processing." This not only simplifies the programming of the sensor network thus making it easier for applications to use the network, but also saves energy. "An application can write the kinds of queries it needs," Gehrke explains. "The application doesn't have to know how to write code on the sensors or how to disseminate the data from the sensors back to the gateway node."
Scientists could deploy these sensors and use a query system like this in any number of applications--bird or other animal habitats, for example. The researchers wouldn't be tasked with programming the sensor network at the lowest level, but they would have flexibility in changing variables, such as what data they need or the frequency with which they make queries to the network.
The small, commercially available sensors are each like a miniature desktop computer, Gehrke says. They use a special operating system for networked sensors, the TinyOS operating system developed at the University of California, Berkeley. Continuing challenges include reducing the energy use of each sensor, supporting more sophisticated queries, and improving the communication structures between sensors.
The Cougar project is supported by the National Science Foundation, the Defense Advanced Research Project Agency, the Cornell Information Assurance Institute, and a gift from Intel.
Besides looking at systems issues, Gehrke is also working on developing a graphical user interface (GUI) that individuals might use to query the sensor network, as well as the code on the sensors themselves. Another CPRS student, Joel Ossher '06, is working with Gehrke on the GUI end of the project.
Gehrke says involving undergraduates in research is one of the exciting opportunities at Cornell. "Although I could probably work more efficiently only with graduate students, it's such a nice experience to see undergraduate students come in who do not know how to do research, and then to really see them grow into a stage where they can do research on their own," he describes.
Ayres says the professor's enthusiasm for working with undergraduates on new research projects was immediately apparent. "He really spends a lot of his time with the undergraduates on his research team," Ayres says. "It's a lot of one-on-one time, just discussing the research. He clearly is very enthusiastic about what he's doing and the research itself."
Ayres speaks with passion about the value of undergraduates conducting research alongside not only other undergraduates, but master's students, doctoral students, and a faculty mentor. "This experience has given me exposure to what it's like to be a Ph.D. student," he says. Ayres plans to pursue at least a master's degree and has found himself drawn to the wireless sensor network area of research. He enjoys looking at novel areas of research and reading new research papers to expand his knowledge.
Computer science is one of the most active and important research fields today, Gehrke says. "Computation is becoming the foundation for a lot of the scientific endeavors of today." He notes that researchers in his group are collaborating with physicists, astronomers, and biologists on current projects. "Computer science is one of the disciplines that permeates all these fields," he says. It's not just core computer science anymore, but the application of computer science techniques to other areas, that makes this collaboration possible--and crucial to the development of new science.
Students like Ayres can develop invaluable skills in a major like computer science by conducting research with a faculty mentor, Gehrke stresses. "It depends on the skills the students bring along and their willingness to do research and to spend the time learning it. If students devote enough time to it, they can make tremendous contributions to research projects and lift their educational experience here at Cornell to a new level."
Joe Wilensky is a staff writer in Cornell's office of Communications and Marketing Services.