User:Matthew.zellmer/bots
Jump to navigation
Jump to search
I want to make Bots that will allow me to find/replace the same text on multiple pages. I can do this using the amazing API, PHP and cURL.
Wiki API can be used over HTTP (internet) to access the inside of the wiki: it can query and edit the wiki
- turns out that Robots, bots can only be used with the API so learn about it
- API basics
- API:FAQ
- API homepage for Wikipedia.com
- API homepage for LOTRO, sends requests here to run bots and scripts
- API:Tutorial
API action=query
- Everything you want to do must go thru the API Entry point: API entrypoint for LOTRO, there is where you send requests or bots too
- You might need to log into the API before it allows you to do anything Login examples and Help
- The lotro-wiki is pretty open the User page shows permissions to the write API
- Use a HTTP query string to do things. Not passing any will give you the help page with the autogenerated documentation.
- API query: API query reference
- API Tutorials: Tutorial link
- These tutorials use PHP and the Zend client: PHP Zend Client Tutorial and Use
- Example query SYNTAX:
- Title search, returns images:
https://en.wikipedia.org/w/api.php?action=query&titles=San_Francisco&prop=images&imlimit=20&format=jsonfm
- Title search, returns basic info:
https://lotro-wiki.com/api.php?action=query&titles=Ered%20Luin&prop=info&format=xml
- Title search, returns basic info:
https://lotro-wiki.com/api.php?action=query&titles=User:Matthew.zellmer/testpage&prop=info&format=jsonfm
- Title search, returns full page:
https://lotro-wiki.com/api.php?action=query&titles=The%20Misty%20Mountains&prop=revisions&format=jsonfm
- Title search, returns images:
- Parts of a Query:
action=query
is used for most read actions, separate action= modules exist for write actions- next will be one of these:
prop=, list=, meta=, title=
titles=
takes one or more titles for the query to operate on (specify pages)- multiple titles with
titles=Foo|Bar|Baz
(This will make multiple calls count as one for the purpose of rate limiting) - This works for pages but not revisions. Read the documentation via the Sandbox or via
api.php
autodocs.
- multiple titles with
list=search (sr)
Perform a full text search API:Search- Search Examples:
- Text search, returns title and snippets:
https://lotro-wiki.com/api.php?action=query&list=search&srwhat=text&srsearch=replaceMe&srlimit=50&sroffset=0&format=xml
- Title search, returns titles and snippets:
https://lotro-wiki.com/api.php?action=query&list=search&srwhat=title&srsearch=replaceMe&srlimit=50&sroffset=0&format=xml
- Text search, returns title and snippets:
- Several additional options to help search
srsearch= What to search for
Search for all page titles (or content) that has this valuesrnamespace=
The namespace(s) to enumerate, Values (separate with '|'): 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109srwhat=
Search inside the text or titles,One value: title, text, nearmatchsrinfo=
What metadata to return, Values (separate with '|'): totalhits, suggestion, Default: totalhits|suggestionsrprop=
What properties to return, this is huge- size - Adds the size of the page in bytes
- wordcount - Adds the word count of the page
- timestamp - Adds the timestamp of when the page was last edited
- score - Adds the score (if any) from the search engine
- snippet - Adds a parsed snippet of the page
- titlesnippet - Adds a parsed snippet of the page title
- redirectsnippet - Adds a parsed snippet of the redirect title
- redirecttitle - Adds the title of the matching redirect
- sectionsnippet - Adds a parsed snippet of the matching section title
- sectiontitle - Adds the title of the matching section
- hasrelated - Indicates whether a related search is available
- Values (separate with '|'): Default: size|wordcount|timestamp|snippet
srredirects=
Include redirect pages in the searchsroffset=
Use this value to continue paging (return by query), Default: 0srlimit=
How many total pages to return, No more than 50 (500 for bots) allowed, Default: 10
- All about prop=
prop=images
lists the images on a page;- multiple modules with
&prop=images|templates&list=allpages|blocks
prop=info
for basic page infoprop=revisions
for page historyprop=revisions&rvprop=content
for page wikitext
- multiple modules with
- generators (kind of like UNIX pipes) with
&titles=Foo&generator=links&prop=revisions
- next will be one of these:
action=parse
for page HTMLlimit=
sets the max # of results. Default is 10, 'max' worksformat=
: xml, json, xmlfm (default), jsonfm (good for debugging). examples of formats formats
- You can specify pages in the following ways:
- By name using the
titles
parameter, e.g.titles=Foo|Bar|Main_Page
- By page ID using the
pageids
parameter, e.g.pageids=123|456|75915
- By revision ID using the
revids
parameter, e.g.revids=478198|54872|54894545
- Most query modules will convert revision ID to the corresponding page ID. Only prop=revisions actually uses the revision ID itself.
- If you want to find sections from the table of contents, use
section=
using theindex
property, and you can call 0 for the wikitext that comes before the first section header.
- By name using the
- get only the content of a page (wikitext)
- If you just want the raw wikitext without any other information whatsoever, it's best to use index.php's action=raw mode instead of the API: https://en.wikipedia.org/w/index.php?**action=raw&title=Main_Page . Note that this'll just output plain wikitext without any formatting.
- To get more information about the page and its latest version, use the API: https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Main_Page .
- You can retrieve 50 pages per API request: https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&titles=Main_Page Articles This also works with generators.
- get the HTML only content of a page
- If you just want the HTML, it's best to use index.php's action=render mode instead of the API: https://en.wikipedia.org/w/index.php?action=render&title=Main_Page .
To get more information distilled from the wikitext at parse time (links, categories, sections, etc.), use the API parse module: https://en.wikipedia.org/w/api.php?action=parse&page=Main_Page . See also the documentation for the API:Parsing wikitext#parse action=parse module.
API action=edit
- Changing wiki content using the
action=edit
to edit a page - three-step process, query to get current data and get edit token, modify data locally, POST data (API:Edit) back to server
- perform query and get token happen at the same time (use
intoken=edit
in the query) - API tokens can only be asked for using specific queries
prop=info&titles=FindThisExactTitle
is one of them- example:
https://en.wikipedia.org/w/api.php?action=query&titles=Foo&prop=info&intoken=edit
for obtaining edit token - the edit token looks like this "a6376ba5c8284f218c935aec5a038032+\\"
- but this token has a +\\ at the end! that’s translated into %2B%5C
- you should return the token like this "a6376ba5c8284f218c935aec5a038032%2B%5C"
- example:
- manipulate the data locally on your computer
- POST back to wiki, POST requests only: Diff between GET and POST
- POST and PHP
- Only way to send POST data is to use a web form
- So you have to have a local webpage that will accept the action=query data into a form then use the form to POST it back to the wiki
- This is what the python scripts do but they do most of the heavy lifting for you
- read the API documentation about action=edit here
- perform query and get token happen at the same time (use
Bots and Robots access the API to perform actions
- before u get to far into bots remember they just use the API over and over again, why cant I just do that manually?
- figure out the API first, ok DONE
- Basic info about Bots
- Category:Bots
- Manual on Bots
- Lotro-Wiki.com:Bots
- Start at the MediaWiki Manual:Bots.
- Most of these scripts are python scripts run using Pywikipediabot
- The Pywikipediabot bots make it so you can manipulate the data the API gives you and easily send it back
- User:Magill/Projects-Index list of Projects and RoBots
- User:Magill-bot/scripts/user-fixes.py
- User:Magill-bot/scripts/replace lotro.py
- User:Magill-bot
- User talk:Magill-bot
- User:RingTailBot
- User:RingTailCat/Sandbox-6 bot examples
- I could use Windows to run the python scripts Basic how-to ut I don't want too
- I want a PHP solution since I am running Windows try a list from here when ready API clients
My solution: Use HTML forms, PHP, JavaScript and AJAX to access the API
- 1. HTML form to build a GET query, use AJAX to send it to the API and get a XML response
- 2. format the returned response, so that I can select what to keep or throw out
- 3. perform a very specific GET query on each selected item and get an edit token at the same time
- 4. search the returned response and add/edit what I want locally using JavaScript or PHP
- 5. send a POST edit action back to the server with the changes using the edit token received earlier
AJAX fail
- AJAX wont do cross-server http requests, unless you setup apache the right way or IE the right way
- discovered solution, I went with a simple solution since I use IE (in Internet options)
- make lotro-wiki.com a trusted site
- change one of the trusted sites security level zone parameters->Missallenaous to "access data sources across domains"
- Other solutions to the AJAX cross-server issue
- https://httpd.apache.org/docs/2.4/mod/mod_headers.html
- https://www.barneyparker.com/configure-apache-to-accept-cross-site-xmlhttprequests-on-debian/
- https://www.w3schools.com/php/php_ajax_php.asp
- https://www.w3schools.com/tags/ref_httpmessages.asp
- https://en.wikipedia.org/wiki/List_of_HTTP_headers#Requests
- https://en.wikipedia.org/wiki/Cross-origin_resource_sharing
- https://icodeguru.com/WebClient/Ajax-Hacks/0596101694/ID-15118.HeadA.Hack_74.html
- https://hacks.mozilla.org/2009/07/cross-site-xmlhttprequest-with-cors
- https://www.yourhtmlsource.com/javascript/ajax.html
- https://www.w3schools.com/ajax/ajax_examples.asp
- https://msdn.microsoft.com/en-us/library/ie/cc288060(v=vs.85).aspx
- discovered solution, I went with a simple solution since I use IE (in Internet options)
- even with AJAX and Apache setup properly, JavaScript variables refuse to properly translate multi-parameter URL text strings for submission in an API get query.
- problem is with the & amp; JavaScript variables always (not matter the encoding or decoding) change them to an amp;
- several decoding global functions dont fix this: eval(), encodeURIComponent(), encodeURI(), decodeURI(), decodeURIcomponent()escape(), so furtrating
- best solution using AJAX would be to code specific queries in form textboxes with each API query element in its own textbox
- this does not allow dynamic queries LAME, not going to use it for now
Use cURL and PHP Instead of AJAX
- found out I can access other the API instead using cURL which is a part of PHP already
Robot, Bot, Scripts, Completed, Solution
- AJAX has limited use given the cross-domain restrictions, I cant get around them
- use other coding options
Build PHP scripts to perform API query only
- design:
- PHP webpage with textbox and form, must know API query language yourself, limited help
- form submits to PHP backend, which performs API query, makes a local copy of the returned XML results
- new page displays the results but you can checkmark the ones you want for further API query
- ended up using the PHP cURL for GET/POST requests to the server
- DONE - its sloppy but its done
Built a PHP script that use cURL to perform API login
- used to login and hold local information for future API edits
- required for any future API POST requests
- design:
- PHP website with a textbox and form to hold the API login request (URL)
- form submits a PHP cURL POST login request
- grab the returned login token (number)
- send the login again with the token using the PHP cURL POST
- verify login by displaying the outcome
- DONE, a bit of a slob on the coding
Build a PHP script that use cURL to perform API edit on a single place
- design:
- use the test website https://testwiki.skunark.net/index.php
- perform API query to search for text
- use initial API query and make a checkbox to allow user a chance to CANCEL future edit
- have a form that will accept, find/replace text on the selected pages (textboxes), submit button
- perform API query GET to get full page text on a selected page and edit token
- find and replace text
- displace old and new versions of the page text
- pop-up box/alert box to allow the user one last ditch change to CANCEL
- perform API edit POST submitting the entire page text (with new replaced text)
- DONE - again sloppy but done, no error handling
Build a PHP script that use cURL to perform API edit on multiple pages
- design:
- use the test website https://testwiki.skunark.net/index.php
- perform API query to search for text, return multiple results
- use initial API query to make a list that user can select (checkbox) pages from a previous query
- have a form that will accept, find/replace text on the selected pages (textboxes), submit button
- perform API query GET to get full page text on a selected page and edit token
- find and replace text
- displace old and new versions of the page text
- perform API edit POST submitting the entire page text (with new replaced text)
- DONE, but its sloppy, query doesn't work right in finding exact pages yet either
Build a PHP script that use cURL to perform API edit on multiple pages
- design:
- use the test website https://testwiki.skunark.net/index.php
- perform API query to search for text, return multiple results
- use initial API query to make a list that user can select (checkbox) pages from a previous query
- have a form that will accept, find/replace text on the selected pages (textboxes), submit button
- perform API query GET to get full page text on a selected page and edit token
- find and replace text
- displace old and new versions of the page text
- perform API edit POST submitting the entire page text (with new replaced text)
- DONE, but its sloppy, query doesn't work right in finding exact pages yet either
Make my scripts able to perform multiple search/find/replace for each page to be edited
- design:
- use previous scripts with single find/replace and modify them
- create more find/replace text boxes
- loop thru each find and replace text instead of just the single find/replace text
- output success for each find/replace or output NO CHANGES MADE
- only execute the API edit if changes were made
- display results of API edit if performed
- removed the textarea that displays the entire pages with changes (its just to much to review)
- DONE! even added a tiny bit of debugging code, but for the most part it crashes terribly still
Make my scripts able to perform search/find and delete entire line for each page to be edited
- design:
- use previous scripts with single find/replace and modify
- create a form checkbox that, if checked will delete the entire line if something on it matched the find
- output success for each line deleted or output NO CHANGES MADE
- only execute the API edit if changes were made
- display results of API edit if performed
- DONE!
test wiki site
- test scripts here
- testwiki.skunark.net
- lotroadmin will have to give you bot rights on that.
- I also don't know the status of the test-wiki currently, as it was "munged" as part of the Server switch/upgrade last month.(Dec2013)