skip to main content
10.1109/NOMS.2016.7502925guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

ENTRADA: A high-performance network traffic data streaming warehouse

Published: 01 April 2016 Publication History
  • Get Citation Alerts
  • Abstract

    We present ENTRADA, a high-performance data streaming warehouse that enables researchers and operators to analyze vast amounts of network traffic and measurement data within interactive response times (seconds to few minutes), even in a small computer cluster. ENTRADA delivers such performance by employing a optimized file format and a high-performance query engine, both open-source. ENTRADA has been operational for more than 1.5 years, having ingested more than 100 TB of pcap files from two .nl DNS authoritative servers. As we discuss, we use this data in projects that aim at further increasing the security and stability of the .nl zone. We present in this paper our design choices, experiences, and a performance evaluation of ENTRADA. Finally, we open-source ENTRADA, which can be used “out-of-the-box” by researchers, operators, and registries to deploy their own networking analysis clusters for DNS traffic, and can be easily extended to handle any other structured data.

    References

    [1]
    L. Golab, T. Johnson, J. S. Seidel, and V. Shkapenyuk, “Stream Warehousing with DataDepot,” in Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD '09. New York, NY, USA: ACM, 2009, pp. 847–854.
    [2]
    A. Bar, A. Finamore, P. Casas, L. Golab, and M. Mellia, “Large-scale network traffic monitoring with DBStream, a system for rolling big data analysis,” in Big Data (Big Data), 2014 IEEE International Conference on, Oct 2014, pp. 165–170.
    [3]
    T. Vanhove, G. Van Seghbroeck, T. Wauters, F. De Turck, B. Vermeulen, and P. Demeester, “Tengu: An experimentation platform for big data applications,” in Distributed Computing Systems Workshops (ICDCSW), 2015 IEEE 35th International Conference on, June 2015, pp. 42–47.
    [4]
    J. Liu, F. Liu, and N. Ansari, “Monitoring and analyzing big traffic data of a large-scale cellular network with Hadoop,” Network, IEEE, vol. 28, no. 4, pp. 32–39, July 2014.
    [5]
    Y. Lee and Y. Lee, “Toward Scalable Internet Traffic Measurement and Analysis with Hadoop,” SIGCOMM Comput. Commun. Rev., vol. 43, no. 1, pp. 5–13, Jan. 2012.
    [6]
    T. White, Hadoop: The Definitive Guide. O'Reilly Media, Inc., 2009.
    [7]
    N. Leavitt, “Will NoSQL Databases Live Up to Their Promise?” Computer, vol. 43, no. 2, pp. 12–14, Feb 2010.
    [8]
    E. Liarou, S. Idreos, S. Manegold, and M. Kersten, “Monetdb/datacell: online analytics in a streaming column-store,” Proceedings of the VLDB Endowment, vol. 5, no. 12, pp. 1910–1913, 2012.
    [9]
    S.I.D.N Labs, “ENTRADA homepage,” http://entrada.sidnlabs.nl/, 2015.
    [10]
    Apache, “Apache Parquet,” https://parquet.apache.org/, 2015.
    [11]
    M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs et al., “Impala: A modern, open-source SQL engine for Hadoop,” in Proceedings of the Conference on Innovative Data Systems Research (CIDR'15), 2015.
    [12]
    M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: cluster computing with working sets,” in Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, 2010.
    [13]
    SIDN, “SIDN: the company behind the .nl,” http://sidn.nl/en, 2015.
    [14]
    P. Mockapetris, RFC 1034 Domain Names-Concepts and Facilities, Internet Engineering Task Force, 1987.
    [15]
    M. Andrews, “Negative Caching of DNS Queries (DNS NCACHE),” RFC 2308, Internet Engineering Task Force, Mar. 1998.
    [16]
    S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis, “Dremel: Interactive Analysis of Web-scale Datasets,” Proc. VLDB Endow., vol. 3, no. 1–2, pp. 330–339, Sep. 2010.
    [17]
    C. Hesselman, J. Jansen, M. Wullink, K. Vink, and M. Simon, “A privacy framework for DNS big data applications,” Tech. Rep., 2015. [Online]. Available: https://www.sidnlabs.nl/uploads/tx_sidnpublications/SIDN_Labs_Privacyraamwerk_Position_Paper_V1.4_ENG.pdf
    [18]
    J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008.
    [19]
    K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop Distributed File System,” in Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, May 2010, pp. 1–10.
    [20]
    M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O’Neil, P. O’Neil, A. Rasin, N. Tran, and S. Zdonik, “C-store: A column-oriented dbms,” in Proceedings of the 31st International Conference on Very Large Data Bases, ser. VLDB '05. VLDB Endowment, 2005, pp. 553–564.
    [21]
    R.I.P.E NCC, “Hadoop PCAP library,” 2015. [Online]. Available: https://github.com/RIPE-NCC/hadoop-pcap
    [22]
    P. Mockapetris, “Domain names-implementation and specification,” RFC 1035, Internet Engineering Task Force, Nov. 1987.
    [23]
    S.I.D.N Labs, “.nl stats and data: Insight into the use of nl,” http://stats.sidnlabs.nl/, 2015.

    Cited By

    View all
    • (2021)TsuNAMEProceedings of the 21st ACM Internet Measurement Conference10.1145/3487552.3487824(398-418)Online publication date: 2-Nov-2021

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium
    1323 pages

    Publisher

    IEEE Press

    Publication History

    Published: 01 April 2016

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)TsuNAMEProceedings of the 21st ACM Internet Measurement Conference10.1145/3487552.3487824(398-418)Online publication date: 2-Nov-2021

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media