1

i am using elastic search 2.2.

here is the count of documents

curl 'xxxxxxxxx:9200/_cat/indices?v'

yellow open   app                 5   1   28019178         5073     11.4gb         11.4gb

In the "app" index we have two types of document.

  1. "log"
  2. "syslog"

Now i want to delete all the documents under type "syslog".

Hence, i tried using the following command

 curl -XDELETE "http://xxxxxx:9200/app/syslog"

But am getting the following error

No handler found for uri [/app/syslog]

i have installed delete-by-query plugin as well. Is there any way i can do a bulk delete operation ?

For now , i am deleting records by fetching the id.

curl -XDELETE "http://xxxxxx:9200/app/syslog/A121312"

it took around 5 mins for me to delete 10000 records. i have more than 1000000 docs which needs to be deleted. please help.

[EDIT -1]

i ran the below query to delete syslog type docs

curl -XDELETE 'http://xxxxxx:9200/app/syslog/_query' -d'
{
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ]
    }
  }
}'

And result is below

{"found":false,"_index":"app","_type":"syslog","_id":"_query","_version":1,"_shards":{"total":2,"successful":1,"failed":0}}

i used to query to get this message from index

 {
      "_index" : "app",
      "_type" : "syslog",
      "_id" : "AVckPMQnKYIebrQhF556",
      "_score" : 1.0,
      "_source" : {
        "message" : "some test message",
        "@version" : "1",
        "@timestamp" : "2016-09-13T15:49:04.562Z",
        "type" : "syslog",
        "host" : "1.2.3.4",
        "priority" : 0,
        "severity" : 0,
        "facility" : 0,
        "facility_label" : "kernel",
        "severity_label" : "Emergency"
}

[EDIT 2]

Delete by query listed as plugin

sudo /usr/share/elasticsearch/bin/plugin list
Installed plugins in /usr/share/elasticsearch/plugins/node1:
    - delete-by-query
1
  • if the document number is large, you should rather create a new index and reindex the documents you want to keep Commented Sep 16, 2016 at 8:55

4 Answers 4

2

I had similar problem, after filling elasticsearch with 77 millions of unwanted documents in last couple of days. Setting timeout in query is your friend. As mentioned here. Curl has parameter to increase too (-m 3600)

curl --request DELETE \
  --url 'http://127.0.0.1:9200/nadhled/tree/_query?timeout=60m' \
  --header 'content-type: application/json' \
  -m 3600 \
  --data '{"query":{
            "filtered":{
              "filter":{
                "range":{
                  "timestamp":{
                    "lt":1564826247
                   },
                  "timestamp":{
                    "gt":1564527660
                  }
                }
              }
            }
          }
        }'

I know this is not your bulk delete, but I've found this page during my research so I post it here. Hope it helps you too.

Sign up to request clarification or add additional context in comments.

Comments

1

In latest Elasticsearch(5.2), you could use _delete_by_query

curl -XPOST "http://localhost:9200/index/type/_delete_by_query" -d'
{
    "query":{
        "match_all":{}
    }
}'

The delete-by-query API is new and should still be considered experimental. The API may change in ways that are not backwards compatible

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

Comments

1

ES 8.11, 2024-01

I don't know what the situation was in 2016, but maybe you could consider doing a bulk delete. What is being described in the other answers at this writing is a _delete_by_query, not a bulk delete. Study that page to understand how bulk operations work.

The downside of this is that it might be quite complicated to determine the _ids of all the LuceneDocuments (index documents) you need to delete. Typically you might have to run a _search query to find these _ids on the basis of your query. You must have these _ids to do a bulk delete.

Someone might say: "ah, but if you have to run a _search and then sort out the _ids and then do a bulk delete, surely that's going to be terribly, terribly slow". I'm not at all sure that that's right: I believe a _delete_by_query turns out to be a very lengthy operation, as the OP has found, when it is meant to apply it to a large number of records, ... whereas a _search might return results very quickly indeed. Parsing the results, and simultaneously creating the bulk delete string, could also be very quick1. And then ES executes these bulk operations incredibly fast.

You also have to make a bulk string conforming to the strict string format required. A bulk delete operation is simpler than a bulk index operation: it just looks like this:

{ "delete" : { "_id" : "234" } }
{ "delete" : { "_id" : "235" } }
{ "delete" : { "_id" : "236" } }
...

(This is the "application/x-ndjson" format you have to use).


1 The choice of computer language might even come into play here, particularly if you can parallelise the parsing of the results from the _search. If I had to do this in a Python project I would seriously think about using a Rust module. But Python's limitations in terms of execution times shouldn't be exaggerated.

Comments

0

I would suggest that you should rather create a new index and reindex the documents you want to keep

But if you wanna use delete by query you should use this,

curl -XDELETE 'http://xxxxxx:9200/app/syslog/_query'

{
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ]
    }
  }
}

but then you'll be left with mapping.

7 Comments

The delete query which you have given above will delete only A121312. i want to delete all docs under index "app" and type "syslog"
This isnt working.. let me edit my post now with query , result and sample message if that would help
i have given sample message as well. Am not sure why the query is resulting found=False eventhough we have syslog messages
have you installed delete by query plugin ?
it's taking _query as _id, check if delete by query plugin is installed
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.