Ken Krugler

Nevada City, California, United States Contact Info

Sign in to view Ken’s full profile

Welcome back

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

1K followers 500+ connections

View mutual connections with Ken

Welcome back

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Join to view profile

Scale Unlimited

Massachusetts Institute of Technology

About

President of Scale Unlimited. Design, development and training for big data processing…

Articles by Ken

Almost a tech support hero

Almost a tech support hero

By Ken Krugler

Apr 16, 2014

Activity

We LOVE our repeat customers! PRC's latest install at Fort Independence Campground in Independence, CA was built to match the exterior of a previous…

We LOVE our repeat customers! PRC's latest install at Fort Independence Campground in Independence, CA was built to match the exterior of a previous…

Liked by Ken Krugler
If I had the opportunity to get bitten by a radioactive spider and get superpowers—whether they be the proportionate strength and speed of a spider…

If I had the opportunity to get bitten by a radioactive spider and get superpowers—whether they be the proportionate strength and speed of a spider…

Liked by Ken Krugler
Real-time analytics is very serious business, and Viktor and I are headed to #rtasummit together, no matter the cost. It's time to make peace with…

Real-time analytics is very serious business, and Viktor and I are headed to #rtasummit together, no matter the cost. It's time to make peace with…

Liked by Ken Krugler

Join now to see all activity

Experience & Education

Scale Unlimited

*** ****** ******** **********

****** *** **** *********
****** ******

******** ********
************* ********* ** **********

** ******** ******* *** ***********

1979 - 1983

View Ken’s full experience

See their title, tenure and more.

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Volunteer Experience

Contributor

Stack Overflow

Dec 2009 - Present 14 years 9 months

Education

I answer questions about Flink, Pinot, Lucene/Solr and Cascading. See https://stackoverflow.com/users/231762/kkrugler?tab=answers&sort=newest
Organizer, Teacher

Girls Who Code

Dec 2015 - May 2018 2 years 6 months

Education

Helped start the Girls Who Code club of Nevada County, taught the first session, and filled in for other teachers during subsequent sessions.
Volunteer Teacher

Bitney College Prep

Jan 2014 - Jun 2015 1 year 6 months

Education

I taught 20 high school students how to program using Python.
Volunteer Teacher

Seven Hills School

Mar 2013 - Jun 2013 4 months

Education

I taught computer programming to middle-schoolers.
Board Member

Bear Yuba Land Trust

Jan 2000 - Jan 2005 5 years 1 month

Environment

Publications

Building a scalable focused web crawler with Flink

Flink Forward SF 2018 April 16, 2018

Is it possible to build an efficient, focused web crawler using Flink? That was the question that led to the creation of the flink-crawler open source project. In this talk I’ll discuss how we use Flink’s support for AsyncFunctions and iterations to create a scalable web crawler that continuously and efficiently performs a focused web crawl with no additional infrastructure. I’ll also discuss some of the testing and debugging challenges encountered when using features such as AsyncFunctions and…

Is it possible to build an efficient, focused web crawler using Flink? That was the question that led to the creation of the flink-crawler open source project. In this talk I’ll discuss how we use Flink’s support for AsyncFunctions and iterations to create a scalable web crawler that continuously and efficiently performs a focused web crawl with no additional infrastructure. I’ll also discuss some of the testing and debugging challenges encountered when using features such as AsyncFunctions and iterations.

See publication
Faster Workflows, Faster

ApacheCon Big Data NA 2016 May 9, 2016

Slides from my talk at ApacheCon Big Data 2016 in Vancouver. I described how we're defining complex ETL workflows using the Cascading API, then running them on Flink (using AWS Elastic Mapreduce).

See publication
Fuzzy Entity Matching

Cassandra Summit 2014 September 11, 2014

I discuss in-depth a real-world use case for combining Hadoop, Cassandra & Solr to solve the problem of quickly matching a target person (entity) against a large corpus of hundreds of millions of potential matches.

See publication
Similarity at Scale

Hadoop Summit 2014 June 3, 2014

In this talk I describe use cases for both batch & real-time similarity, and discuss my experience using Hadoop and Solr to generate high quality results at scale for several different clients. I cover entity resolution (people & places), false identity detection (fraud), real-time recommendation systems and automatic document linking. These are all based on past projects for real customers. Techniques I discuss are feature extraction from text, distributed SimHash, and Solr-based real time…

In this talk I describe use cases for both batch & real-time similarity, and discuss my experience using Hadoop and Solr to generate high quality results at scale for several different clients. I cover entity resolution (people & places), false identity detection (fraud), real-time recommendation systems and automatic document linking. These are all based on past projects for real customers. Techniques I discuss are feature extraction from text, distributed SimHash, and Solr-based real time similarity scoring.

See publication
Suicide Risk Prediction using Social Media and Cassandra

Cassandra Summit 2013 July 13, 2013

I describe a portion of an early-phase project that uses social media data (tweets, Facebook posts, etc.) from service personnel to predict suicide rates. There’s a lot of motivation to provide better data for military psychologies, since more military wind up taking their own lives than are killed in the line of duty. By analyzing social media data that is voluntarily provided by personnel, plus a predictive analytics system, we can provide assessments that help mental health workers focus…

I describe a portion of an early-phase project that uses social media data (tweets, Facebook posts, etc.) from service personnel to predict suicide rates. There’s a lot of motivation to provide better data for military psychologies, since more military wind up taking their own lives than are killed in the line of duty. By analyzing social media data that is voluntarily provided by personnel, plus a predictive analytics system, we can provide assessments that help mental health workers focus their time and energy on the most at-risk individuals. This project uses Cassandra as the scalable storage system for this social media data, which is then analyzed in a distributed environment using Hadoop. The project also uses the Solr search support from DataStax Enterprise to provide ways for users to dig into the underlying data, which is critical when understanding the assigned risk levels

See publication
Faster, Cheaper, Better - Replacing Oracle with Hadoop and Solr

Hadoop Summit 2012 June 14, 2012

This talk is a distillation of experience with clients, where we use Hadoop to do off-line pre-processing of data, which then lets us use Solr as a NoSQL solution that provides faster query processing on less hardware, while adding additional search & faceting functionality.

See publication
A Very Short History of Big Data

BigDataCamp 2011 November 8, 2011

My lightening talk from the BigDataCamp in Washington, DC.

See publication
A (very) short intro to Hadoop

BigDataCamp 2011 November 7, 2011

A very short introduction to Hadoop, from the talk I gave at the BigDataCamp held in Washington DC. Some of this content is also covered in the various big data classes we offer via on-site training (see http://www.scaleunlimited.com/training/)

See publication
Thinking at Scale with Hadoop

SDForum SAM SIG September 22, 2010

Presentation I gave at the SDForum SAM SIG (Software Architecture & Modeling) meeting. This talk provides a brief introduction to Map-Reduce & Hadoop, then discusses challenges of implementing complex data processing using low-level Map-Reduce support, and a number of solutions.

See publication
Elastic Web Mining

ACM Data Mining Unconference November 1, 2009

PDF version (with notes) of my talk at the ACM Data Mining Unconference. How to use an open source stack (Hadoop, Cascading, Bixo) in EC2 for cost effective, scalable and reliable web mining.

See publication

Patents

Static Code Scoring

Filed August 29, 2008 US 12/231242

A method of ranking source code search results, using static factors derived from file attributes, context, activity and link (usage) graph analysis.

Projects

Flink Web Crawler

Aug 2016 - Present

A continuous scalable web crawler built on top of Flink and crawler-commons, with bits of code borrowed from bixo.

See project
Suicide prediction from social media activity

Feb 2012 - Present
Use social media (Facebook, Twitter, etc) activity to predict which military personnel have the highest risk of suicide.

Uses Gigya to collect social media activity, stores it in Cassandra, and then applies multiple predictive analytics models via Hadoop/Cascading to calculate risk.

Other creators
See project
Display advertising analytics

Jun 2011 - Present

Provide back-end support for an analytics web site that helps advertisers and publishers optimize display advertising.

Process millions of crawled pages each day with Hadoop/Cascading, to build an OLAP (on-line analytics platform) using Solr indexes. Apply classification and clustering algorithms to enhance results with IAB codes, similar advertisers, recommended publishers, ec.

See project
Focused crawl/index for market analytics

May 2010 - Present
Provide back-end support for market research platform that is used by analysts to define & create brand and topic-tracking reports.

System uses a focused crawler (with SVM classifiers) to find and extract content, particularly conversations about products on the web. The pipeline then does key term extraction, additional classification and clustering, and finally builds topic-specific search indexes that incorporate pipeline results for augmented search functionality.

Other creators
See project
Near real time web page similarity

Oct 2010 - May 2011
Find pages in a 20M+ corpus that were similar to an arbitrary target page. Do it in a few milliseconds, and support hundreds of requests/second on a single server.

This used a Cascading/Hadoop workflow to extract key terms from the corpus, and built a Solr index and "corpus map" for the near-real time similarity engine.

Other creators

Languages

Japanese

-

Organizations

Nevada County Tech Connection

Member

Apr 2017 - Present

Supporting, connecting and showcasing the technology and digital media eco-system as a thriving entity in Nevada County by facilitating events, educational and training opportunities, collaborative efforts and streamlined communication amongst Businesses, Talent, Education providers, Workforce development agencies, and the Community at large.
The Apache Software Foundation

Member

Jan 2011 - Present

I'm a committer on the Tika content extraction project, focusing on HTML parsing, character encodings and language detection.

Recommendations received

3 people have recommended Ken

Join now to view

More activity by Ken

I am very grateful to the members of the The Apache Software Foundation to let me rejoin their ranks! I had gone Emeritus for a few years as I…

I am very grateful to the members of the The Apache Software Foundation to let me rejoin their ranks! I had gone Emeritus for a few years as I…

Liked by Ken Krugler
Today, we're excited to launch a new project that has been months in the making: Keyboard and Quill - a carefully-crafted, playful, narrative podcast…

Today, we're excited to launch a new project that has been months in the making: Keyboard and Quill - a carefully-crafted, playful, narrative podcast…

Liked by Ken Krugler
This is my advanced Flink class (at Flink Forward 2023). On the afternoon of the SECOND day. Usually there's a significant drop-off in energy when…

This is my advanced Flink class (at Flink Forward 2023). On the afternoon of the SECOND day. Usually there's a significant drop-off in energy when…

Shared by Ken Krugler
One more thing … Mojo🔥 is now available on Mac M1/M2 and it’s ⚡️🚀 fast! ⚡️🚀, download and try popular communities projects like llama.🔥 And don’t…

One more thing … Mojo🔥 is now available on Mac M1/M2 and it’s ⚡️🚀 fast! ⚡️🚀, download and try popular communities projects like llama.🔥 And don’t…

Liked by Ken Krugler
Join me David Anderson, Confluent & Onehouse for an awesome meetup during #flinkforward #flinkforward2023 in Renton, Washington! I'm excited to meet…

Join me David Anderson, Confluent & Onehouse for an awesome meetup during #flinkforward #flinkforward2023 in Renton, Washington! I'm excited to meet…

Liked by Ken Krugler
Just last week, we had the privilege of sharing some momentous news – Apache Pinot has achieved the remarkable milestone of graduating to version…

Just last week, we had the privilege of sharing some momentous news – Apache Pinot has achieved the remarkable milestone of graduating to version…

Liked by Ken Krugler
Latest SUP Streaming Updates for the People #podcast!!! I interview Robert Zych, engineer at Raft and #ApachePinot committer. Robert talks about…

Latest SUP Streaming Updates for the People #podcast!!! I interview Robert Zych, engineer at Raft and #ApachePinot committer. Robert talks about…

Liked by Ken Krugler
I'll be sharing some new bits about Restate next week at Current Conference. https://lnkd.in/e8-NM8Rq Join me for the session "𝗦𝘁𝗿𝗲𝗮𝗺…

I'll be sharing some new bits about Restate next week at Current Conference. https://lnkd.in/e8-NM8Rq Join me for the session "𝗦𝘁𝗿𝗲𝗮𝗺…

Liked by Ken Krugler

View Ken’s full profile

See who you know in common
Get introduced
Contact Ken directly

Join to view full profile

Sign in

Stay updated on your professional world

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Ken Krugler

Ken Krugler

--

Fairfield, IA

1 other named Ken Krugler is on LinkedIn

See others named Ken Krugler

Add new skills with these courses

See all courses

About

Articles by Ken

Almost a tech support hero

By Ken Krugler

Activity

We LOVE our repeat customers! PRC's latest install at Fort Independence Campground in Independence, CA was built to match the exterior of a previous…

Liked by Ken Krugler

If I had the opportunity to get bitten by a radioactive spider and get superpowers—whether they be the proportionate strength and speed of a spider…

Liked by Ken Krugler

Real-time analytics is very serious business, and Viktor and I are headed to #rtasummit together, no matter the cost. It's time to make peace with…

Liked by Ken Krugler

Experience & Education

Scale Unlimited

*********

View Ken’s full experience

See their title, tenure and more.

Volunteer Experience

Contributor

Organizer, Teacher

Volunteer Teacher

Bitney College Prep

Volunteer Teacher

Seven Hills School

Board Member

Publications

Flink Forward SF 2018 April 16, 2018

ApacheCon Big Data NA 2016 May 9, 2016

Cassandra Summit 2014 September 11, 2014

Hadoop Summit 2014 June 3, 2014

Cassandra Summit 2013 July 13, 2013

Hadoop Summit 2012 June 14, 2012

BigDataCamp 2011 November 8, 2011

BigDataCamp 2011 November 7, 2011

SDForum SAM SIG September 22, 2010

ACM Data Mining Unconference November 1, 2009

Patents

Static Code Scoring

Filed August 29, 2008 US 12/231242

Projects

Aug 2016 - Present

Feb 2012 - Present

Jun 2011 - Present

May 2010 - Present

Near real time web page similarity

Oct 2010 - May 2011

Languages

Japanese

-

Organizations

Nevada County Tech Connection

Member

The Apache Software Foundation

Member

Recommendations received

Alex Malbet

Swapnil Sapar

More activity by Ken

I am very grateful to the members of the The Apache Software Foundation to let me rejoin their ranks! I had gone Emeritus for a few years as I…

Liked by Ken Krugler

Today, we're excited to launch a new project that has been months in the making: Keyboard and Quill - a carefully-crafted, playful, narrative podcast…

Liked by Ken Krugler

This is my advanced Flink class (at Flink Forward 2023). On the afternoon of the SECOND day. Usually there's a significant drop-off in energy when…

Shared by Ken Krugler

One more thing … Mojo🔥 is now available on Mac M1/M2 and it’s ⚡️🚀 fast! ⚡️🚀, download and try popular communities projects like llama.🔥 And don’t…

Liked by Ken Krugler

Join me David Anderson, Confluent & Onehouse for an awesome meetup during #flinkforward #flinkforward2023 in Renton, Washington! I'm excited to meet…

Liked by Ken Krugler

Just last week, we had the privilege of sharing some momentous news – Apache Pinot has achieved the remarkable milestone of graduating to version…

Liked by Ken Krugler

Latest SUP Streaming Updates for the People #podcast!!! I interview Robert Zych, engineer at Raft and #ApachePinot committer. Robert talks about…

Liked by Ken Krugler

I'll be sharing some new bits about Restate next week at Current Conference. https://lnkd.in/e8-NM8Rq Join me for the session "𝗦𝘁𝗿𝗲𝗮𝗺…

Liked by Ken Krugler

View Ken’s full profile

Sign in

Other similar profiles

Hongsen He

Grant Ingersoll

Sameer Agarwal