— [EDM,GED,DMS] roadmap +1 : warmup@2016, 2D indexing comparo with the heavyweight opensource solutions#lucene out-of-my-box@2017 —
| Documents [0.] | | Images [1.] | | Audios [2.] | | Videos [3.] | |||||||||||||||
00.txt | 01.odt | 02.doc | 03.pdf | 04.ods | 05.xls | 06.odp | 07.ppt | 08.odt | 09.pdf | 10.jpg | 11.gif | 12.png | 20.mp3 | 21.wav | 22.amr | 30.avi | 31.mp4 | 32.mkv |
`Context` data and cluster ready.
Index [myDoc*.*]... Search [BigData|Big*]...
| Documents [0.] | | Images [1.] | | Audios [2.] | | Videos [3.] | |||||||||||||||||
00.txt | 01.odt | 02.doc | 03.pdf | 04.ods | 05.xls | 06.odp | 07.ppt | 08.odt | 09.pdf | 10.jpg | 11.gif | 12.png | 20.mp3 | 21.wav | 22.amr | 30.avi | 31.mp4 | 32.mkv | ||
#3 | 09/20 | |||||||||||||||||||
#2 | 10/20 | |||||||||||||||||||
#1 | 12/20 | |||||||||||||||||||
Vertical limit | Horizontal indexing end | |||||||||||||||||||
Vertical and horizontal : 2 dimensions to improve the indexing surface |
12/20 | ||||
10/20 | ||||
09/20 |
"http://jbd-vm01.jbdata.fr:8983/solr/myCollec-0/select?indent=on&q=Big*&fl=id,a_s,a_i,a_f&sort=a_f asc,a_i asc&rows=100&wt=json" { "responseHeader":{ "zkConnected":true, "status":0, "QTime":7, "params":{ "q":"Big*", "indent":"on", "fl":"id,a_s,a_i,a_f", "sort":"a_f asc,a_i asc", "rows":"100", "wt":"json"}}, "response":{"numFound":9,"start":0,"docs":[ { "id":".../dev/ged-06/input-20/myDoc-00.txt"}, { "id":".../dev/ged-06/input-20/myDoc-01.odt"}, { "id":".../dev/ged-06/input-20/myDoc-02.doc"}, { "id":".../dev/ged-06/input-20/myDoc-03.pdf"}, { "id":".../dev/ged-06/input-20/myDoc-04.ods"}, { "id":".../dev/ged-06/input-20/myDoc-05.xls"}, { "id":".../dev/ged-06/input-20/myDoc-06.odp"}, { "id":".../dev/ged-06/input-20/myDoc-07.ppt"}, { "id":".../dev/ged-06/input-20/myDoc-10.jpg"}] }}
"http://jbd-vm01.jbdata.fr:9200/mydocs-idx/doc/_search?pretty" -d '{ "query": { "bool": { "must": [ { "match" : { "content" : "BigData" } } ], "must_not": [], "should": [] } }, "from": 0, "size": 50, "sort": [], "aggs": {} }' { "took" : 23, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 10, ".../dev/ged-06/input-20/myDoc-07.ppt" ".../dev/ged-06/input-20/myDoc-01.odt" ".../dev/ged-06/input-20/myDoc-00.txt" ".../dev/ged-06/input-20/myDoc-10.jpg" ".../dev/ged-06/input-20/myDoc-05.xls" ".../dev/ged-06/input-20/myDoc-12.png" ".../dev/ged-06/input-20/myDoc-03.pdf" ".../dev/ged-06/input-20/myDoc-02.doc" ".../dev/ged-06/input-20/myDoc-06.odp" ".../dev/ged-06/input-20/myDoc-04.ods"
java org.apache.lucene.demo.SearchFiles -index .../dev/ged-06/.lucene -query "big*" Searching for: big* 12 total matching documents 1. .../dev/ged-06/output-20/myDoc-05.txt 2. .../dev/ged-06/output-20/myDoc-03.txt 3. .../dev/ged-06/output-20/myDoc-21.txt 4. .../dev/ged-06/output-20/myDoc-20.txt 5. .../dev/ged-06/output-20/myDoc-00.txt 6. .../dev/ged-06/output-20/myDoc-07.txt 7. .../dev/ged-06/output-20/myDoc-02.txt 8. .../dev/ged-06/output-20/myDoc-04.txt 9. .../dev/ged-06/output-20/myDoc-12.txt 10. .../dev/ged-06/output-20/myDoc-06.txt Press (n)ext page, (q)uit or enter number to jump to a page. n 11. .../dev/ged-06/output-20/myDoc-01.txt 12. .../dev/ged-06/output-20/myDoc-10.txt
Tika 1.15 upgrade and tunning to increase indexing surface : ( vertical + 2 ) * ( horizontal - 1 ) = 13.
| Documents [0.] | | Images [1.] | | Audios [2.] | | Videos [3.] | |||||||||||||||||
00.txt | 01.odt | 02.doc | 03.pdf | 04.ods | 05.xls | 06.odp | 07.ppt | 08.odt | 09.pdf | 10.jpg | 11.gif | 12.png | 20.mp3 | 21.wav | 22.amr | 30.avi | 31.mp4 | 32.mkv | ||
#3 | 09/20 | |||||||||||||||||||
#2 | 10/20 | |||||||||||||||||||
#1 | 13/20 |