iPathCaseKEGG: An iPad interface for KEGG metabolic pathways
© Johnson et al.; licensee BioMed Central Ltd. 2012
Received: 23 April 2012
Accepted: 4 September 2012
Published: 10 January 2013
Kyoto Encyclopedia of Genes and Genomes (KEGG) is an online and integrated molecular database for several organisms. KEGG has been a highly useful site, helping domain scientists understand, research, study, and teach metabolisms by linking sequenced genomes to higher level systematic functions. KEGG databases are accessible through the web pages of the system, but the capabilities of the web interface are limited. Third party systems have been built over the KEGG data to provide extensive functionalities. However, there have been no attempts towards providing a tablet interface for KEGG data. Recognizing the rise of mobile technologies and the importance of tablets in education, this paper presents the design and implementation of iPathCaseKEGG, an iPad interface for KEGG data, which is empowered with multiple browsing and visualization capabilities.
iPathCaseKEGG has been implemented and is available, free of charge, in the Apple App Store (locatable by searching for “Pathcase” in the app store). The application provides browsing and interactive visualization functionalities on the KEGG data. Users can pick pathways, visualize them, and see detail pages of reactions and molecules using the multi-touch interface of iPad.
iPathCaseKEGG provides a mobile interface to access KEGG data. Interactive visualization and browsing functionalities let users to interact with the data in multiple ways. As the importance of tablets and their usage in research education continue to rise, we think iPathCaseKEGG will be a useful tool for life science instructors and researchers.
KeywordsKEGG iPad App PathCase Metabolic networks Pathways
Kyoto Encyclopedia of Genes and Genomes (KEGG) is one of the well-known biological knowledge bases, that links genomic information to high-level functions of the cell, for several organisms [1, 2]. KEGG, among others, provides the association between sequenced genomes and their correspondences in metabolic networks and biochemical pathways [1, 2]. Although the KEGG system has added over time, new tools and additional databases such as KEGG Brite, the core of the system is centered on KEGG Pathways, which specify the sets of reactions that participate in biochemical pathways for different organisms. KEGG Genes and KEGG Ligand databases specify information about the associated genes and the chemicals/molecules for KEGG Pathways, respectively. Through a web interface, KEGG provides maps of metabolic pathways for visualization purposes. KEGG pathways data is made available at an FTP site, and web services for third party applications are provided as an API.
PathCaseKEGG system is one of the PathCase family of systems, with a PathCaseKEGG database, populated from KEGG data. PathCaseKEGG system provides, over the web, additional browsing, querying, visualization, and analysis functionalities for KEGG data, such as interactive visualization . And, PathCase-SB, another PathCase system, integrates KEGG data with data from other data sources [4, 5] such as BioModels [6, 7]. Both PathCase systems provide hierarchical browsing; a query interface and a JAVA applet-based interactive visualization of pathways.
In this paper, we describe iPathCaseKEGG which is an iPad-based system with a mobile interface that accesses PathCaseKEGG, KEGG Pathways, and ENZYME databases, and provides browsing and interactive visualization functionalities on the accessed data. To the best of our knowledge, as of August 2012, there have been no attempts, with the exception of SimGene , towards providing a mobile interface for browsing and visualizing KEGG pathways. SimGene, designed for iPhone (not for iPad), is a gene-related information searching tool, and does not focus on pathway browsing or visualization. More specifically, given a gene symbol, SimGene searches many databases (Simbiot, Ensembl, NCBI, Gene Ontology, KEGG Pathways, PubMed, Genomic Variations and other databases) for more than 30 species, and provides annotated information about the specified gene.
iPad has been a revolutionizing mobile device with the multi-touch based interface. Apple has sold 9.25 million iPads only in the fiscal 2011 third quarter . With the releases of similar devices by other brands, tablets are becoming widespread. Ease of use with familiar gestures and mobility of tablets have contributed to their success. However, websites designed for computers are not optimized for mobile devices. Therefore, companies release their own apps, or redirect users to mobile versions of their web pages. To the best of our knowledge, there is no such app or mobile version for the KEGG web site.
Tablets also have started to attract attention in education. For instance, Apple has made an effort to revolutionize text books , and countries distribute tablets to school children (such as Turkey ). We think that KEGG is an invaluable resource for life sciences education, and there is a need for a mobile interface for instructors and researchers to access KEGG data. In this paper, we present iPathCaseKEGG, an iPad application to interface with KEGG data. The system comes with the following functionalities:
Interactive visualization of metabolic pathways,
Access to the KEGG web page regarding a pathway within the app,
Highlighting a node and close proximity,
Highlighting organism-specific subsets of reactions,
Enzyme details pane from ENZYME Database for a selected reaction through EC (Enzyme Commission) Numbers .
iPathCaseKEGG has been implemented, extensively tested for iPad 2, and is freely available in Apple App Store.
Overview of KEGG
As of July 23, 2012, KEGG consists of seventeen databases: KEGG ORTHOLOGY, GENOME , GENES, DGENES and SSDB in genomic information, KEGG COMPOUND, GLYCAN, REACTION, RPAIR, RCLASS and ENZYME in chemical information, KEGG PATHWAY, BRITE, MODULE, DISEASE, DRUG and ENVIRON in systems information. The most unique data object in KEGG is the molecular networks -- molecular interaction, reaction and relation networks representing systemic functions of the cell and the organism. Systemic functions capture experimental knowledge from literature, which are organized in three forms: Pathway maps - in KEGG PATHWAY, Hierarchical list (ontology) - in KEGG BRITE, Membership list - in KEGG MODULE and KEGG DISEASE. These databases constitute the reference knowledge base for biological interpretation of genomes and high-throughput molecular datasets through the process of KEGG mapping, which is a process to map elementary datasets (genes, proteins, small molecules, etc.) to network datasets (KEGG pathway maps, BRITE functional hierarchies, and KEGG modules). Currently there are four types of mapping operations available in KEGG: pathway mapping, brite mapping, module mapping and taxonomy mapping, which may involve molecular or non-molecular datasets (orthologs, modules, and organisms) and the network dataset (taxonomic tree). While the main focus of KEGG has been the molecular network, different types of data and knowledge, such as disease genes and drug targets, are also integrated as part of the KEGG molecular networks to support translational bioinformatics.
Overview of PathCaseKEGG
PathCase KEGG is a web-based system to store, browse, query, visualize and analyze KEGG metabolic pathways at different levels of genetic, molecular, biochemical and organismal detail. It integrates (i) browsing interface to access different pathway information and pathway-related information (processes, molecular entities, genes), (ii) an interactive client-side visualization tool for metabolic pathways, with powerful visualization capabilities, and with integrated gene and organism viewers, (iii) three distinct querying capabilities: keyword search function for pathways, processes, organisms, basic molecules, proteins and genes, an advanced querying interface for computer savvy users, and built-in queries for ease of use, that can be issued directly from pathway visualizations, (iv) pathway functionality analysis tools, such as the gene ontology viewer, and (v) pathway and process export and import capabilities.
Overview of iPad interface
iPad 2 (hereafter simply “iPad”) is a device about 7.3x9.5x0.3 inches with a 1024x768 pixel touch screen . The touch screen can detect more than one human finger at a time, allowing for actions such as zooming to be accomplished with natural gestures using two or more fingers. iPad runs iOS, a UNIX-based operating system which presents one full-screen application at a time and provides multithreading, persistent storage, wireless network access, and other services. This section describes only those iPad-specific user interface concepts that iPathCaseKEGG uses.
Unlike traditional computer displays, the iPad is designed to be held in any orientation, with the display’s contents rotating and resizing to fit the users perspective. There are four possible orientations, but the shortcuts “landscape” and “portrait“ suffice for describing and implementing most interfaces .
The most basic component of an iPad interface is the “view”. A view is a rectangular region that is capable of displaying content, receiving events such as gestures and device orientation changes, and delegating these tasks to “subviews”. Views can contain buttons, toolbars, lists, other views, and arbitrary graphical content. “Nested views” consist of a view hierarchy.
A “button” is a view that renders itself in enabled, disabled, and pressed states, and invokes some action when a touch is released inside its bounds.
“Toolbars” are views that can be inserted at the top or bottom of another view. They typically contain buttons, text labels, progress indicators, and more.
A “scroll view” contains another, larger view that the user can move around with one or two fingers. For example, one might be used to display a photo that is larger than the screen. Scroll views can be dragged around with one finger, or zoomed out by “pinching two fingers” together.
“Tapping twice” in the same location typically zooms in on the tapped location.
One common use for the scroll view is in a table view, used to display lists of content. The user can scroll up and down the list and tap a row to invoke some action. Some types of views exist only to contain other views. One view of this type is the “popover”. A popover appears temporarily on the screen, allowing the user to perform some quick interaction before dismissing it by tapping outside its bounds. The popover is a key component in the master-detail interface, a common arrangement of views in iPad applications.
Overview of iPathCaseKEGG capabilities
In addition to displaying a node’s long description from PathCaseKEGG, iPathCaseKEGG uses data from the ENZYME enzyme nomenclature database  to show more information for reactions for which EC numbers are available. ENZYME’s data comes from the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB) . An example of ENZYME information in the sidebar is shown in Figure 7 on the left hand side.
iPathCaseKEGG’s architecture consists of a few well-defined components:
User interface objects present things like the master-detail interface, lists of pathways, organism selection, etc.
Web service request classes make HTTP requests to the PathCaseKEGG server, process the response, and notify other objects of the results.
Navigation user interface
PCDetailViewController responds to five different notifications, but only two are shown in Figure 10. The rest have to do with web services and pathway visualization interactions, which will be discussed later in this section. The remaining nodes in Figure 10, PCPathwayTableViewController and PCNodeInfoViewController, are displayed in the master content area. The master view makes use of a navigation stack. The top item on this stack is the PCMasterViewController, which pushes a PCPathwayTableController onto the stack when a pathway category is selected. When one of the pathways in that list is selected, a “show pathway” notification is posted, and a PCNodeInfoViewController is pushed onto the stack. This is the view controller that displays information for the current selection, so it responds to notifications sent by PCGraphViewController about changes in the user’s selection.
iPathCaseKEGG does not download the entire pathways data in PathCaseKEGG database at once. HTTP requests are made as new data is needed, and the result is cached in the device’s internal storage. It has a separate class for each web service, named PC < WebService > Fetcher. iPathCaseKEGG’s web service classes fire events to a central notification system when they have completed, and controller objects check to see if any new web services should be invoked due to dependencies being made available.
After the XML response has been parsed into one or more KEGG internal objects, these objects are not serialized. In other words, all objects generated by web services are built from the XML each time they are needed.
Text layer for drawing node labels. This layer contains one sub-layer for each node that renders the text for its label.
Node layer for drawing node shapes, and
Edge layer for drawing edges.
Once these layers are rendered, they are drawn to the screen by a scroll view, which takes care of translating and scaling them in response to user interaction.
Life cycle of a pathway visualization
Dynamic layouts with graphviz
PathCaseKEGG contains human-curated frozen layouts for some pathways, but not all. In order to display the pathways without frozen layouts, iPathCaseKEGG uses the Graphviz library. Graphviz is a system which, among other things, provides graph layout and visualization functions . It provides both command line and programmatic interfaces for entering, laying out, and rendering graphs. iPathCaseKEGG encapsulates the layout functionality of the Graphviz library in a class called GVGraph. An instance of this class represents a graph with nodes, a label for each node, and one directional edge that connect nodes.
Reading the ENZYME database
The ENZYME database is provided in a flat text file format [12, 15]. This format is not acceptable for random seeking behavior, so it has been converted to a SQLite database  for more efficient access. SQLite is a minimal implementation of SQL intended to be used as a storage medium for single-instance applications . iPathCaseKEGG uses the description, alternate name, catalytic activity, cofactor, and comments, ignoring the remaining fields.
Results and discussion
Tablets have revolutionized the mouse and keyboard based interface with a mobile and a multi touch based interface. Many gestures such zoom-by-pinch or swipe to turn the page are very user-friendly. Not to mention the new retina display, iPad interface has been a breakthrough so far. As stated before, tablets have already been adopted as an educational tool. We foresee that tablets will keep attracting more attention and will be more widely used as prices drop; universities that distribute laptops to students will start to give tablets; classes will be held using tablets such as iPads; heavy hardcopy textbooks will be replaced by e-books, and students will be able to carry thousands of them in a single tablet device. For our application, KEGG is an important source for life sciences researchers, and for biology/biochemistry education as it maintains extensive biochemical pathways data. With the interactive visualization interface of iPathCaseKEGG, users can interact with KEGG pathways using a multi-touch interface and gestures, which is a major improvement over the static visualizations provided by KEGG. We believe that iPathCaseKEGG is useful for (i) students who use KEGG to learn metabolic networks, (ii) instructors that teach material using a visual tool and (iii) researchers who would like to employ metabolic pathways in their research.
Comparison with Web-based PathCaseKEGG
Comparison of web-based PathCase KEGG and iPathCase KEGG interfaces
Molecule and reaction visualization
Four mouse-based navigation tools
Pan with one finger, zoom with two
Greying out reactions that are not present in the selected organism set
Moving nodes to rearrange the layout
Dynamic layout when no frozen layout is present
Saving layouts for future recall
Lists of processes, connected pathways, pathway annotations, and pathway-related queries in tabular form
Identifying the location of the gene producing the catalyzing enzyme
Hiding common molecules
Display of data from ENZYME database
Display of data from KEGG web site
The functionality “Greying out reactions that are not present in the selected organism set” is available in both PathCaseKEGG and iPathCaseKEGG, which allows the user to focus on only those reactions of the pathway available in the chosen organism set. The functionality “Moving nodes to rearrange the layout” of the pathway at hand is also available in both interfaces. As it turns out, biochemists always would like to have the same layouts for pathways, and automated layouts may need to be manually edited by some users; hence this functionality is considered essential for life sciences researchers. Also, this functionality gives the user the flexibility of changing the layouts based on their specific needs, i.e., emphasizing a certain part of a given pathway.
The functionality "Identifying the location of the gene producing the catalyzing enzyme" allows pathway analysis in terms of the locations of genes producing catalyzing enzymes of different reactions. This functionality can be viewed as a higher level pathway analysis functionality, and is not provided in iPathCaseKEGG.
Common molecules are those molecules that participate in many reactions, i.e., CO2, ATP, ADP, NAD+, NADP+, NH4+, NADPH, NH3, NADH, Oxygen, CoA, H+, H2O, H2O2. Common molecules can also be regarded as social network “hubs” of pathways. PathCaseKEGG allows users to “hide” common molecules when multiple pathways are drawn together as it reduces the visualization complexity of pathways. This feature is also not provided in iPathCaseKEGG since, due to its small screen size, we have opted not to allow users of iPathCaseKEGG to visualize multiple pathways together as one sub-network of the whole metabolic network.
Technical challenges and possible improvements
The biggest challenge of the project was controlling the passage of information from the PathCase servers to the iPad application, both in how the HTTP requests were made and how the data was translated into a form usable by the rest of the application. The data is requested from the PathCaseKEGG server in a much more piecemeal way. The data for each pathway is encapsulated entirely in a single request; so there is no need for complex dependency management or even data storage above and beyond simple HTTP request caching. The data models, HTTP requests, and parsing are cleanly separated.
When the project started, we underestimated the software system complexity involved in iPathCaseKEGG (as well as other iPathCase systems). In terms of software architecture, in the future, the system components that download the data, parse it, build model objects, as well as various controls should be re-factored to separate each of these tasks explicitly. Additionally, there should be a more formal data dependency tracking system to control the bulk download of the web service data. It is common to have HTTP requests for web services A, B, and C rely on data from request X, so there should be a formalized dependency graph that tracks this information.
iPathCaseKEGG’s biggest technical issue is a noticeable hang when selecting a large pathway to view, due to the HTTP requests and responses that involve creation and transfer of large (2–5 megabytes) XML documents, sometime with large frozen layout information, as well as dynamic layout creation delays due to the use of Graphviz. Caching pathways a priori has mostly, but not fully, eliminated this problem. However, when a new or revised pathway in the PathCaseKEGG system is encountered and it needs a dynamic layout generation, the system takes significant amounts of time (ranging from 30 to100 seconds) to request and receive the new/revised pathway, parse and produce a new layout, and display it. A delay of this magnitude sometimes leads users into thinking that the app has crashed, which is not acceptable in an online visualization environment. More effort should be spent for both (a) lazy processing and downloads of new pathways (perhaps as part of “software updates” to the app which are really data updates), and (b) moving processing to a separate thread to allow the main user interface to remain responsive.
Yet another technical challenge was rendering large graphs efficiently on a device with limited memory and processing power. Fortunately, the iOS application frameworks made pathway rendering possible, though various low-level optimizations were required at different stages of development.
Finally, similar to graph rendering, dynamic graph layouts accounted for a considerable amount of development time, despite the convenience of the Graphviz library to provide layouts on an as-needed basis.
iPathCaseKEGG has been a successful exercise in extending the PathCase architecture and philosophy to new devices and new users. It is one thing to design a web interface and server backend in tandem, and another to apply the server’s web service interface to an entirely new set of applications. iPathCaseKEGG shows that PathCase’s architecture serves the purpose for which it was built: to provide a platform for analysis and reference tools on top of multiple data sources for pathways. iPathCaseKEGG :
Allows browsing of all KEGG pathways,
Links reactions with the external ENZYME database, and
Allows access to the KEGG web page corresponding to any pathway.
The application adapts the basic PathCase pathway visualization interface concepts for a touch screen, specifically making heavy use of iOS idioms. This adaptation and iPathCaseSMDA, another PathCase iPad application already at the Apple App Store, provide insight about how pathway visualizations can be improved across all forms of input and display.
iPathCaseKEGG opens up many exciting possibilities for the use of tablets in educational and research environments. The best thing that can be done to make these possibilities become realities is to perform studies and have discussions with practicing life scientists and students. Formal and informal feedback should be gathered from possible users to determine the most effective future improvements. Even their simple presence in the iOS online store should provide helpful feedback (positive or negative) that will affect the development of both applications.
Interactive visualizations of complex biological data are no longer limited to immobile desktop workstations or cumbersome laptops; the iPad is even less intrusive than a physical textbook and can display information in new and interesting ways. iPathCaseKEGG brings interactive pathway visualizations, reference materials, and even research tools to desks, meetings, and couches.
Availability and requirements
Project name: iPathCaseKEGG
Project download source: Apple App Store
Operating system: iOS 5.
Programming language: Objective-C.
Other requirements: iPad2 or newer.
This research has been supported by the National Science Foundation grants DBI 0743705, DBI 0849956, CRI 0551603 and by the National Institute of Health grant GM088823.
- Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M: KEGG for integration and interpretation of large-scale molecular datasets. Nucleic Acids Res. 2012, 40: D109-D114. 10.1093/nar/gkr988.PubMed CentralView ArticlePubMed
- Kanehisa M, Goto S: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000, 28: 27-30. 10.1093/nar/28.1.27.PubMed CentralView ArticlePubMed
- Elliott B, Kirac M, Cakmak A, Yavas G, Mayes S, Cheng E, Wang Y, Gupta C, Ozsoyoglu G, Ozsoyoglu ZM: PathCase: pathways database system. Bioinformatics. 2008, 24 (21): 2526-2533. 10.1093/bioinformatics/btn459.View ArticlePubMed
- Cakmak A, Qi X, Coskun SA, Das M, Cheng E, Cicek AE, Lai N, Ozsoyoglu ZM, Ozsoyoglu G: PathCase-SB Architecture and Database Design. BMC Syst Biol. 2011, 5: 188-10.1186/1752-0509-5-188.PubMed CentralView ArticlePubMed
- Coskun SA, Qi X, Cakmak A, Cheng E, Cicek AE, Yang L, Jadeja R, Dash RK, Lai N, Ozsoyoglu G, Ozsoyoglu ZM: PathCase-SB: Integrating data sources and providing tools for systems biology research. BMC Syst Biol. 2012, 6: 67-10.1186/1752-0509-6-67. (to appear)PubMed CentralView ArticlePubMed
- Le Novère N, Bornstein B, Broicher A, Courtot M, Donizelli M, Dharuri H, Li L, Sauro H, Schilstra M, Shapiro B, Snoep JL, Hucka M: BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Res. 2006, 34 (Database-Issue): 689-691.View Article
- BioModels database–A Database of Annotated Published Models.http://www.ebi.ac.uk/biomodels-main,
- SimGene. An application in Apple app store
- Apple Press Info.http://www.apple.com/pr/library/2011/07/19Apple-Reports-Third-Quarter-Results.html,
- Apple Education.http://www.apple.com/education/ipad/,
- Sabah Newspaper: Turkish Ministry of National Education distributes tablets in schools.http://english.sabah.com.tr/National/2012/02/01/distribution-of-free-tablets-to-students-has-begun,
- Bairoch A: The ENZYME database in 2000. Nucleic Acid Research. 2000, 28 (1): 304-305. 10.1093/nar/28.1.304.View Article
- Apple Inc. iOS human interface guidelines.http://developer.apple.com/library/ios/documentation/UserExperience/Conceptual/MobileHIG/MobileHIG.pdf,
- Gansner ER, North SC: An open graph visualization system and its applications to software engineering. Software — Practice and Experience. 1999, S1: 1-5.
- The ENZYME nomenclature database user manual.http://enzyme.expasy.org/enzuser.txt,
- yFiles developer’s guide.http://docs.yworks.com/yfiles/doc/developers-guide/index.html,
- About SQLite.http://www.sqlite.org/about.html,
- iPathCaseSMDA. An application in Apple app store
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.