Currently, data volumes of applications in scientific, engineering, and commercial fields are expected to double every two years. With this continuing data explosion, it is necessary to store and process data efficiently by utilising enormous computing power that is available in the form of cluster and super computers. The data analytics and intelligence retrieving pose crucial challenges for high performance large-scale data processing, including exploiting parallelism of current and upcoming computer architectures, programming models and algorithmic methodology for data intelligence mining, as well as engineering issues for implementation, deployment and operation of commercial data intensive computing systems.
These challenges are significant and expected to grow with the scaling of the data and the whole computing system. More importantly, these new challenges may comprise, sometimes even deteriorate, the performance, efficiency and scalability of the dedicated data intensive computing systems.
There is no doubt in the industry and research community that the importance of data-intensive computing has been increasing and will continue to be among the foremost fields of research. This increase brings up many research issues, in forms of capturing and accessing data effectively and speedily, processing it while still achieving high performance and high throughput, and storing it efficiently for future use.
Programming for high performance yielding data-intensive computing is an important challenging issue. Expressing data access requirements of applications and designing programming language abstractions to exploit parallelism are an immediate need. Application and domain specific optimisations are also parts of a viable solution in data-intensive computing.
While these are a few examples of issues, research in data-intensive computing has become quite intense during the last few years, yielding strong results. Moreover, in a widely distributed environment, data is often not locally accessible and has thus to be remotely retrieved and stored. While traditional distributed systems work well for computation that requires limited data handling, they may fail in unexpected ways when the computation accesses, creates and moves large amounts of data especially over wide-area virtualised cloud environments.
This special issue will contain the significantly revised selected papers of the IPDPS workshops we organised (the forthcoming 2014 International Workshop on High Performance Data Intensive Computing (pending), 2013 International Workshop on High Performance Data Intensive Computing and 2012 International Workshop on High Performance Data Intensive Computing). However, we also welcome external submissions.
Topics include, but are not limited to, the following:
- Big data science and foundations, analytics, visualisation and semantics
- Software and tools for big data management
- Algorithmic, experimental, prototyping and implementation
- Data-driven innovation, computational modelling and data integration
- Computing, scheduling and resource management for sustainability
- High performance distributed cache and optimisation
- High performance data transfer and ingestion
- NoSQL data store
- Machine learning algorithms for big data
- Data-aware high performance data access toolkits and middleware
- Service oriented architectures for data-intensive computing
- Power and energy efficiency for data-intensive computing systems
- Programming models, abstractions for data-intensive computing
- Data capturing, management and scheduling techniques
- MapReduce, Hadoop, Spark and their applications in data-intensive computing
- Performance measurement, analytical modelling, simulation
- Distributed ensemble classifier
Submission of Manuscripts: 1 July, 2014
Notification to Authors: 15 September, 2014
Final Versions Due: 15 October, 2014
No comments:
Post a Comment