If you've previously redirected URLs, you may still have links to the old URL on your website. If you're looking for a list of all of those links, the current page they point to (the old URL), and the page they should point to (the new URL), you're in the right place.
The method below may be a useful first step to updating internal links to the old URLs.
Updating old links can help increase page loading speed and improve crawls of your site.
Why this request isn't as straightforward as it sounds
This request mixes data from different sources: page data and links data. At the moment, it isn't possible to request information from both sets of data at the same time.
That's okay: you can still obtain the data you want from each set of data, and then use different tricks to join the results together.
How to get a list of all pages that have been redirected and the URL they are redirected to
Navigate to the Data Explorer under Tools in the crawl results.
Use the Oncrawl Query Language to set a filter for all pages with a 301 status code.
Remove all columns except Full URL and Redirect final target. (If Redirect final target isn't present, you can add it using the + button.)
Click Export data at the top of the page.
How to get a list of all links to pages that have been redirected
Navigate to the Data Explorer under Tools in the crawl results.
Change the dataset to Links and click on the quickfilter labeled Pages pointing to 3xx errors.
Remove all columns except Origin: Full URL and Target: Full URL.
How to merge the files
Using VLOOKUP in Excel
Open a new Excel workbook.
Paste the contents of your first file (the pages) into the first sheet. Name the sheet "Pages".
Create a new sheet. Paste the contents of your second file (the links) into a second sheet and name it "Links".
Add a column to the "Links" sheet and name the column "New location".
In Cell C2 of the links sheet, use a VLOOKUP search for the link destination that you see in B2 in the list of pages in the other sheet.
Here is the full formula:
=VLOOKUP(B2;'Pages!A$1:B$1000;2;FALSE)
Make sure that the "1000" in B$1000
is the number of the last line in the list of Pages.
Copy the cell C2 and page it in the rest of the cells in column C.
Don't forget to save the Excel file.
Using the csvjoint function in the csvkit Python package
Download and install the csvkit Python package, available here on GitHub.
Full documentation for the csvjoin function is also available.
You'll need to ask it to join file 2 (the links) to file 1 (the URLs), using the second column in the first file (the URL linked to) and the first column in the second file (the URL that was redirected).
You'll need:
The location of the first file, which will look something like:
~/Downloads/export-5975c7e1451c953ed90d7b7c-custom_query.csv
The location of the second file, which will look something like:
~/Downloads/export-5975c7e1451c953ed90d7b7c-custom_query\ \(1\).csv
Note the extra backslash \
before the opening and closing parentheses.
Execute the command -c 2,1
.
Here is the full commande line:
csvjoin -c 2,1 ~/Downloads/export-5975c7e1451c953ed90d7b7c-custom_query.csv ~/Downloads/export-5975c7e1451c953ed90d7b7c-custom_query\ \(1\).csv > results_origin_target_location.csv
This may take a while, depending on the number of lines in the first file, which each need to be looked up in the second file.
This will create a file called "results_origin_target_location.csv" with a line for each link. Each line lists the link origin, the page to which it links, and the page it is redirected to.