— Next episode of the /usr/bin/grep saga —
The incontournable logs aggregation Hortonworks, MapR and myDistrib
2.6.X has been continued #1@2015 (
2.7.1), #2@2016 (
2.7.2), #2.1@2016.

&(ElasticSearch->grep) = &(*ElasticSearch.grep) ?
Spark cluster ready.
From oldschool grep to org.apache.hadoop.examples.Grep (grep 2.0) to xgrep.py (grep 2.1).

MapReduce grep 2.0, already oldschool in the BigData timescale ?
hadoop org.apache.hadoop.examples.Grep hdfs://vm01.jbdata.fr:9000/logstash/2016-05-12 hdfs://vm01.jbdata.fr:9000/_output "INFO"
Spark-Python grep 2.1, soon oldschool in the roadmap ?
/usr/local/spark/bin/spark-submit /data31tech/dev/spark-00/xgrep.py /test/lstash-00/2016-06-10/hadoop-hduser-datanode-vm01.log ERROR
#
# @author JBD-2016-07
# http://data31tech.fr
#
from __future__ import print_function
import sys
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.sql import SparkSession
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: xgrep file pattern", file=sys.stderr)
exit(-1)
aSparkContext = SparkContext(appName="xgrep@JBD")
aFile = aSparkContext.textFile(sys.argv[1])
theErrors = aFile.filter(lambda line, pattern=sys.argv[2] : pattern in line)
print("Results#SparkContext:")
print(theErrors)
print("Number of {} : {}".format(sys.argv[2], theErrors.count()))
# for anItems in theErrors.collect():
# print(anItems)
aSparkContext.stop()
2nd level, SparkSession release. 3rd level, RDD for another CaseStudy#*.
#
# @author JBD-2016-07
# http://data31tech.fr
#
from __future__ import print_function
import sys
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.sql import SparkSession
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: xgrep file pattern", file=sys.stderr)
exit(-1)
aSparkSQLSession = SparkSession.builder.appName("xgrep@JBD").getOrCreate()
theLines = aSparkSQLSession.read.text(sys.argv[1])
theErrors = theLines.filter("value like '%ERROR%'")
print("Results#SparkSQLSession:")
print(theLines)
print(theErrors)
theLines.show()
theErrors.show()
aSparkSQLSession.stop()
From ASM, C/C++, SDK32/MFC, JAVA, 2D/3D, PHP, Androïd, HTML5 to Pig Latin, jruby, python by the way Shell(s), PERL, IDL, OCCAM, LISP, PROLOG, APT, TODO#?